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COGNITION ANALYSIS 

RESERVATION OF COPYRIGHTS 

[000 1] The disclosure of this patent document contains material which is subject 
to copyright protection. The copyright owner has no objection to the facsimile 
reproduction by anyone of the patent document or the patent disclosure, as it appears in 
the Patent and Trademark Office patent file or records, but otherwise reserves all 
copyright rights whatsoever. 

BACKGROUND 

[0002] Neuroimaging has been used to detecting abnormalities in individuals that 

suffer from neuropsychiatric disorders. However, the conventional methods for 
evaluating neuropsychiatric disorders rely on outwards signs or "exophenotypes" of 
illness. The American Psychiatric Association's Diagnostic Statistical Manual (DSM-IV, 
1 994) is an example of a diagnostic procedure that uses such exophenotypes. Further, a 
number of psychiatric disorders are believed to caused, at least in part, by genetic 
components, the vast majority of which remain unidentified. 

SUMMARY 

[0003] Methods for evaluating information about the structure and Amotion of 
neural circuits in the brain can be used for diagnosis and gene identification and, 
accordingly, are of particular importance for medicine, pharmacology, and society. 
Many of the methods and data management features described herein consolidate 
relationships within multi-dimensional complex data sets, e.g., data sets that include 
systems biology measures such as fi'om neuroimaging, and, optionally also genetic 
measures, e.g., from the same individuals. Generally, this process can be applied to any 
multi*variable space, and not just quantitative measures from the brain or genome. In the 
context of neuroimaging and genetics, this process is one that has direct implications for 
the identification of genes for susceptibility and/or resistance to fimctional braiii illness. 
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[0004] Accordingly, in one aspect, the invention features a datastructure that 
includes: a) genetic information that describes a plurality of genetic markers of a subject 
or a reference to such information; and b) a systems biology map of the subject or a 
reference to such a map, e.g., wherein the map includes information about neural circuit 
function in the brain. The datastructure can be encoded in machine-accessible media or 
memory. The datastructure can also be transmitted, e.g., as a signal (e.g., a modulated or 
encoded) or other communication, e.g., electronically or digitally. 

[0005] In one embodiment, the systems biology map includes structural 

information (e.g., only structural information). 

[0006] . In one embodiment, the systems biology map includes functional 
information. For exaniple, the systems biology map includes information about activity 
in a plurality of brain regions in at least one mental process, e.g., a paradigm, e.g., in at 
least two, three, four, or five paradigms. The paradigm, typically^ includes an external 
framework with which a mental process interacts. For example, the mental process can 
be made to interact with an external stimulus, an external request, an external task, or an 
external sequence. 

[0007] In one embodiment, the plurality of brain regions includes at least five, 

ten, twenty, tfiirty, forty, fifty, or sixty brain regions. For example, at least one, ten, 
twenty, or thirty of the brain regions of the plurality are selected from Table 1 . 
Subregions or smaller volumes than the exemplary regions in Table 1 can also be used, as 
can regions that aire defined by larger volumes and encompass one or more of the 
exemplary regions. 

[0008] In one embodiment, the information for each of the brain regions is 

independent of reference to a coordinate frame. For example, the brain'regions can be ' 
identified by a numerical index (e.g., an index values for each of a set of predefined 

» 

regions) or by text (e^g., a categorical reference)«or an indirect reference (e.g.,-use of 

pointers and hyperlinks). In another embodiment, one or more the brain regions can be 
identified by reference to a coordinate frame, e.g., Talairach coordinates. For example, 
however, the information is not indexed voxel by voxel so as not to be in a form of a 
raster, i.e., the information is non-rasterized. 
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[0009] In one embodiment, the paradigm interacts with the informational 
backbone for motivation, e.g., it evokes at least one region in the infomiational backbone 
for motivation. In one embodiment, the paradigm evokes at least one region in the 
informational backbone for motivation. In one embodiment, the paradigm interacts with 
mechanisms for representation and convergence, feature evaluation, probability 
assessment, outcome processing, valuation, reward/aversion processing, counterfactual 
comparisons, and memory. In one embodiment, the paradigm interacts with mechanisms 
for selection of objectives for fitness, mechanisms for selection of behavior, or 
information processing (e.g., reception). In one embodiment, the paradigm interacts with 
mechanisms for language and symbol processing, mechanisms for communication, 
and/or mechanisms for social behavior. 

[00010] In one embodiment, the systems biology map can include 

information obtained by imaging, e.g., neuroimaging, e.g., tomography, e.g., MRI, fMRI, 
MEG, fCT, 01, SPECT, or PET system. Neuroimaging can including imaging at least 
one region of the brain or central nervous system. 

[001 1] In an embodiment in which there is information for least two paradigms, 

these paradigms may interact with overlapping, but non^coextensive regions of the brain, 
e.g., each paradigm may interact with at least one region that is not activated in another 
paradigm by a normal subject. Exemplary paradigms include: a social reward paradigm, 
a CPT / probability paradigm, a physiological aversion / pain paradigm, a mental rotation 
paradigm, an emotional faces paradigm, and a monetary reward paradigm. Other 
paradigms can also be used. For example, another paradigm which interrogates the 
informational backbone for motivation or other areas described herein, e.g., an area 
interrogated by one of the above paradigms can be used. 

[0012] In one embodiment, the information about activity for at least one of the 
regions includes deviations from a reference (e.g., percentage differences, ratios, and 
subtractive values). 

[0013] In one embodiment, the systeins biology map includes a plurality of 

matrices, each matrix including information about neural activity in a plurality of defined 
brain regions during different paradigms. In another embodiment, the map includes a 



3 



Attorney Docket No. 00786413P01 



similar or identical set of infonnation, but is stored or represented in another form, e.g., 
as text, graphic, e.g., as a vector, table, etc. 

[0014] For example, the plurality of genetic mailcers includes markers on at least 
two, three, four, five, six, ten, twelve, or fifteen different, non-homologous chromosomes. 
In one embodiment, the plurality of genetic markers includes markers on each autosome, 

* 

e.g., at least one, two, five, ten, twenty, or fifty markers on each autosome. For example, 
at least 20, SO, or 70% of the markers can be spaced closer than SOO, SO, 20, 10, or 2 Mb 
to another marker or 200, 100, 50, 20, or 10 cM to another marker. 

[0015] The genetic information can includes information about, e.g., nucleotide 
identity for a plurality of genetic markers, methylation status for a plurality of genetic 
markers, parental origin for one or a plurality of genetic markers, chromatin structure or 
accessibility for one or a plurality of genetic markers, a haplotype, microsatelUte marker, 
sequence tagged site, SNP, a chromosomal deletion, inversion, transversion, 
rearrangement, trisomy, or other chromosomal abnormality. 

[0016] In another aspect, the invention features a datastructure that includes: a 
systems biology map of a subject wherein the map includes quantitative information 
about neural circuit fimction in the brain. For example, the information indicates fimction 
of a plurality of regions of the brain during a plurality of mental processes. 

[0017] In one embodiment, the systems biology map includes structural 

information (e.g., only structural information) 

[0018] In one embodiment, the systems biology map includes fimctional 

information. For example, the systems biology map includes information about activity 
in a plurality of brain regions in at least one mental process, e.g., a paradigm, e.g., in at 
least two, three, four, or five paradigms. In one embodiment, the plurality of brain 
regions includes at least five, ten, twenty, thirty, forty, fifty, or sixty brain regions. For 
example, at leaisf one, ten, tweiity, or thirty of the brain reg^^ 
selected fi-ora Table 1 . Subregions or smaller volumes than the exemplary regions in 
Table 1 can also be used, as can regions that are defined by larger volumes and 
encompass one or more of the exemplary regions. 
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[0019] In one embodiment, the information for each of the brain regions is 

independent of reference to a coordinate frame. For example, the brain regions can be 
identified by a numerical index (e.g., an index values for each of a set of predefined 
regions) or by text (e.g., a categorical reference) or an indirect reference (e.g., use of 
pointers and hyperlinks). In another embodiment, one or more the brain regions can be 
identified by reference to a coordinate frame, e.g., Talairach coordinates. For example, 
however, the information is not indexed voxel by voxel so as not to be in a form of a 
raster, i.e., the information is non-rasterized. 

[0020] In one embodiment, the paradigm interacts with the informational . 
backbone for motivation, e.g., it evokes at least one region in the informational backbone 
for motivation. In one embodiment, the paradigm evokes at least one region in the 
informational backbone for motivation. In one embodiment, the paradigm interacts with 
mechanisms for representation and convergence, feature evaluation, probability 
assessment, outcome processing, valuation, reward/aversion processing, counterfactual 
comparisons, and memory. In one embodiment, the paradigm interacts with mechanisms 
for selection of objectives for fitness, mechanisms for selection of behavior, or 
information processing (e.g., reception). In one embodiment, the paradigm interacts with 
mechanisms for language and symbol processing, mechanisms for communication, 
and/or mechanisms for social behavior. 

[002 1 ] The systems biology map can include information obtained by imaging, 
e.g., neuroimaging, e.g., tomography, e.g., MRI, £MRI, MEG, fCT, 01, SPECT, or PET 
system. 

[0022] In an embodiment in which there is information for least two paradigms, 
these paradigms may interact with overlapping, but non-coextensive regions of the brain, 
e.g., each paradigm may interact with at least one region that is not activated in another 
paradigm by a normal subject. 

[0023] Exemplary paradigms include: a social reward paradigm, a CPT / 

probability paradigm, a physiological aversion / pain paradigm, a mental rotation 
paradigm, an emotional faces paradigm, and a monetary reward paradigm. Other 
paradigms can also be used. For example, another paradigm which interrogates the 
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informational backbone for motivation or other areas described herein, e.g., an area 
interrogated by one of the above paradigms can be used. 

[0024] In one embodiment, the information about activity for at least one of the 
regions includes deviations from a reference (e.g., percentage differences, ratios, and 
subtractive values). 

[0025] In one embodiment, the systems biology map includes a plurality of 
matrices, each matrix including information about neural activity in a pluraUty of defined 
brain regions during different paradigms. In another embodiment, the map includes a 
similar or identical set of information, but is stored or represented in another form, e.g., 
as text, graphic, e.g., as a vector, table, etc. 

[0026] The datastructure can further include genetic information that describes a 
plurality of genetic markers of the subject or a reference to such information. For 
example, the pluraUty of genetic markers includes markers on at least two, three, four, 
five, six, ten, twelve, or fifteen different, non-homologous chromosomes. In one 
embodiment, the plurality of genetic markers includes markers on each autosome, e.g., at 
least one, two, five, ten, twenty, or fifty markers on each autosome. For example, at least 
20, 50, or 70% of the markers can be spaced closer than 500, 50, 20, 10, or 2 Mb to 
another marker or 200, 100, 50, 20, or 10 cM to another marker. 

[0027] The genetic information can includes information about, e.g., nucleotide 
identity for a plurality of genetic markers, methylation status for a plurality of genetic 

4 

markers, parental origin for one or a plurality of genetic markers, chromatin structure or 
accessibiUty for one or a pluraUty of genetic markers, a haplotype, microsatellite marker, 
sequence tagged site, SNP, a chromosomal deletion, inversion, transversion, 
rearrangement, trisomy, or other chromosomar abnormality. 

[0028] The datastructure can be encoded in machine-accessible media or 
memoiy. The datasthicfiire can also be transmitted; ie.g., as a signal (e.g., a modulated or 
encoded) or other conununication, e.g., electronically or digitally. 

[0029] In another aspect, the invention features a datastructure that includes: a 

systems biology map of a subject wherein the map includes a pluraUty of values 
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corresponding to a set of continuous variables, wherein the variables of the set 
correspond to different regions of the brain, and the values that correspond to the 
variables indicate function of respective regions during a mental process. 

[0030] In one embodiment, the systems biology map includes structural 

information (e.g., only structural information) 

[0031] In one embodiment, the systems biology map includes functional 
information. For example, the systems biology map includes information about activity 
in a plurality of brain regions in at least one mental process, e.g., a paradigm, e.g., in at 
least two, three, four, or five paradigms. In one embodiment, the plurality of brain 
regions includes at least five, ten, twenty, thirty, forty, fifty, or sixty brain regions. For 
example, at least one, ten, twenty, or thirty of the brain regions of the plurality are 
selected from Table 1. Subregions or smaller volumes than the exemplary regions in 
Table 1 can also be used, as can regions that are defined by larger volumes and 
encompass one or more of the exemplary regions. 

[0032] In one embodiment, the information for each of the brain regions is 

independent of reference to a coordinate frame. For example, the brain regions can be 
identified by a numerical index (e.g., an index values for each of a set of predefined 
regions) or by text (e.g., a categorical reference) or an indirect reference (e.g., use of 
pointers and hyperlinks). In another embodiment, one or more the brain regions can be 
identified by reference to a coordinate fi-ame, e.g., Talairach coordinates. For example, 
however, the information is not indexed voxel by voxel so as not to be in a form of a 
raster, i.e., the information is non-rasterized. 

[0033] In one embodiment, the paradigm interacts with the informational 

backbone for motivation, e.g., it evokes at least one region in the informational backbone 
for motivation. In one embodiment, the paradigm evokes at least one region in the 
informational backbone for motivation. In one embodiment, the paradigm interacts with 
mechanisms for representation and convergence, feature evaluation, probability 
assessment, outcome processing, valuation, reward/aversion processing, counterfactual 
comparisons, and memory. In one embodiment, the paradigm interacts with mechanisms 
for selection of objectives for fitness, mechanisnris for selection of behavior, or 
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information processing (e.g., reception). In one embodiment, the paradigm interacts with 
mechanisms for language and symbol processing, mechanisms for communication, 
and/or mechanisms for social behavior. 

[0034] In one embodiment, the systems biology map is condensed relative to a 
native dataset (e.g., a rasterized dataset), e.g., at least 10, 10^, 10^, 10"^, 10^, or 10^ fold. 

[0035] In an embodiment in which there is information for least two paradigms, 
these paradigms may interact with overlapping, but non-coextensive regions of the brain, 
e.g., each paradigm may interact with at least one region that is not activated in another 
paradigm by a normal subject. 

[0036] Exemplary paradigms include: a social reward paradigm, a CPT / 
probability paradigm, a physiological aversion / pain paradigm, a mental rotation 
paradigm, 

[0037] an emotional faces paradigm, and a monetary reward paradigm. Other 
paradigms can also be used. For example, another paradigm which interrogates the 
informational backbone for motivation or other areas described herein, e.g., an area 
interrogated by one of the above paradigms can be used. 

[0038] In one embodiment, the information about activity for at least one of the 
regions includes deviations from a reference (e.g., percentage differences, ratios, and 
subtractive values). 

[0039] In one embodiment, the systems biology map includes a plurality of 
matrices, each matrix including information about neural activity in a plurality of defmed 
brain regions during different paradigms. In another embodiment, the map includes a 
similar or identical set of information, but is stored or represented in another form, e.g., 
as text, graphic, e.g., as a vector, table, etc. 

[0040] . The datastructure can further, include genetic information that describes a 
plurality of genetic markers of the subject or a reference to such information. For 
example, the plurality of genetic markers includes markers on at least two, three, four, 
five, six, ten, twelve, or fifteen different, non-homologous chromosomes. In one 
embodiment, the plurality of genetic markers includes markers on each autosome, e.g.^ at 
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least one, two, five, ten, twenty, or fifty markers on each autosome. For example, at least 
20, 50, or 70% of the markers can be spaced closer than 500, 50, 20, 10, or 2 Nfb to 
another marker or 200, 100, 50, 20, or 10 cM to another marker. 

[0041] The genetic information can includes information about, e.g., nucleotide 
identity for a plurality of genetic markers, methylation status for a plurality of genetic 
markers, parental origin for one or a plurality of genetic markers, chromatin structure or 
accessibility for one or a plurality of genetic markers, a haplotype, microsatellite marker, 
sequence tagged site, SNP, a chromosomal deletion, inversion, transversion, 
rearrangement, trisomy, or other chromosomal abnormality. 

[0042] In one embodiment, the datastructure further includes c) information that 

is an index corresponding to the subject. For example, the index can be randomized, 
encrypted, or anonymous. In another example, the index directly identifies the subject 
(e.g., name, social security number etc). In one embodiment, the index associates the 
subject with familial or other pedigree information. 

[0043] The datastructure can be encoded in machine-accessible media or 

memory. The datastructure can also be transmitted, e.g., as a signal (e.g., a modulated or 
encoded) or other communication, e.g., electronically or digitally. 

[0044] The invention also features database including: a plurality of records, 

wherein each record of the plurality includes a datastructure described herein or other 
datastructure which is condensed relative to native information (e.g., rasterized data) 
obtained firom subjects at a plurality of time points. In one embodiment, the datastructure 
is accessible to statistical analysis (e.g., uncompressed) and enables phenotypic 
classification of subjects. 

[0045] In one embodiment, the records of the plurality include records for a 
plurality of unrelated individuals and records for at least one biological family member of 
each of the plurality of unfelateid individuals/ For example, at least 5, 10, 20, 30, or 50% 
of the database can include records for individuals for which there is also a record for a 
biologically related family member. 



9 



Attorney Docket No. 00786-813P01 



[0046] In one embodiment, the database includes records for at least SO, 100, 200, 
SOO, 1000, 3000 or 30,000 human subjects, or ranges therebetween. In one embodiment, 
the database includes records for individuals from diCTerent populations (e.g., ethnic 
populations, e.g., at least two, three, or four different continents, e.g., Caucasians, 
Africans, Polynesians, Native Americans, and Asians). 

[0047] In one embodiment, the database includes records for at least SO, 100, 200, 
SOO, 1000, 3000 or 10,000 human subjects who each have a clinical diagnosis of a 
neurological and/or psychiatric disorder, e^g., schizophrenia, manic depression, bipolar 
disorder, addictions (e.g., substance abuse, gambling, etc.), obsessive-compulsive 
disorder, anxiety/paranoia, autism, schizo-affective disorder, delusional disorder, 
psychosis, antisocial personality disorder, or anorexia/bulimia nervosa. For example, the 
database can include at least 50, 100, 200, 500, 1000, 3000 or 10,000 for a single 
disorder. 

[0048] For example, the datastructure is a condensed form of a native dataset, 
e.g., (a rasterized dataset). For example, the datastructure is condensed at least 10, 10^^ 
10^10^10^orlO^ fold. 

[0049] In one embodiment, the datastructure fiirther includes genetic information 

or a reference to such information. The datastructure can include other features described 
herein. 

[0050] In another aspect, the invention features a method that includes: providing 
a database that includes information about brain activity (e.g., structural and/or functional 
information) for each of a plurality of subjects (e.g., a database described herein); and 

[005 1 ] classifying the subj ects based on the information. 

[0052] In one embodiment, the classifying includes selecting a subset of 
variables, and sorting the subjects as a function of the variables of the subset. For 
example, the subset of variables can be selected based on the information content (e.g., 
relative information contmt) of each of the variables. For example, the subset of. 
variables can be selected based on correlations (e.g., autocorrelations) among the 
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variables. In one embodiment^ each variable is associated with an activity of a brain 
region and a mental process, e.g., a paradigm. 

[00S3] In one embodiment, the classifying includes generating, evaluating, or 
characterizing a tree, e.g., a binary tree. For example, each node of the tree corresponds 
to a variable associated with a particular region of the brain and a mental process. 

[0054] In one embodiment, the classifying is recursive. 

[OOSS] In one embodiment, the pluraUty of subjects includes at least SO, 100, 200, 

500, 1000, or 3000 human subjects. 

[0056] In one embodiment, the classifying includes an association rule algorithm. 

For example, the association rule algorithm is non-parametric. 

[0057] In one embodiment, the classifying includes a classification tree analysis, 
hierarchical clustering, Bayesian clustering, k-means clustering, self-organizing maps, 
and/or shortest path analysis. 

[0058] In one embodiment, the method further includes comparing genetic 
information among subjects of at least one class, e.g., evaluating a statistic for association 
of one or more genetic markers among the subjects of the at least one class. 

[0059] In one embodiment, the information includes quantitative volumetric data 

evaluated by tomography, e.g., MRI, e.g., fMRI or mMRI. The quantitative volumetric 
data can include a plurality of matrices. 

[0060] In one embodiment, the subjects are social non-human animals, e.g., non- 

human primates or voles. In one embodiment, the subjects are voles. 

[0061] In another aspect, the invention features a method that includes: providing 
a database that includes quantitative information about brain function for each of a 
plurality of subjects; 

[0062] identifying, e.g., objectively identifying, a subset of subjects from the 

plurality of subjects according to similarity of brain function. In one embodiment, a 
plurality of subsets are objectively identified. 
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[0063] In one embodiment, the identifying includes selecting, e.g., objectively 
selecting, a subset of quantitative variables whose values vary among the plurality of 
subjects. 

[0064] In one embodiment, the method further includes receiving additional 
quantitative information about brain function for at least one additional subject, and 
evaluating whether the additional subject is a member of the identified subset. 

[006S] For example, the identifying includes generating one or more association 
rules that model the subset; a decision tree that models the subset; and a probability 
function that models the subset. In one embodiment, the database includes systems 
biology maps. For example, the systems biology maps includes values determined 
evaluating subjects during at least two different mental processes. 

[0066] In another aspect, the invention features a data-tree that includes a 
plurality of nodes, wherein each non-terminal node includes (i) a reference to a variable 
or variable class, wherein the variable or variable class is a paranieter of brain function in 
the subject, (ii) optionally, a node level, and (iii) criterion for distinguish descendants of 
the node. 

[0067] For example, the tree is a binary tree. In one embodiment, each non- 

terminal node includes a pointer to one or more descendant nodes. 

[0068] In one embodiment, for at least some of the nodes of the plurality, the 
criterion is an association rule. In one embodiment, each descendant node can be defined 
by a function, e.g., a probabilistic or statistical function, that differentiates it fi-om a 
sibling descendant node. In one embodiment, the nodes are ordered as function of 
variables that they respectively reference, e.g., as a function of information content or 
autocorrelations for the respective variables. For example, at least one of the variables or 
variable classes refers to a brain region in a paradigm. 

[0069] In another aspect, the invention features a datastructure including a 
plurality of matrices, wherein each matrix includes functional information obtained 
during a mental process of a subject, the matrix including at least two dimensions, a first 
dimension that identifies regions of the brain, and one or more values for each region, 
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wherein the values correspond to activity levels in the respective regions during the 
mental process. In one embodiment, the second dimension identifies left/right 
hemisphere. 

[0070] In one embodiment, the datastructure includes a first matrix that includes 
functional information obtained during a first paradigm and a second matrix that includes 
fimctional information obtained during a second paradigm. 

[007 1] In one embodiment, the datastructure includes a first matrix that includes 
first values that depend on a dataset obtained by imaging the subject at multiple 
timepoints (e.g., a native dataset, e.g., rasterized data), wherein the first values are 
independent of information firom other subjects, and a second matrix that includes second 
Values that depend on the same dataset, wherein the second values are determined or are 
selected as a function of information from other subjects. In one embodiment, the second 
values are selected based on location of activation centers detected in an aggregate of 
image information firom a plurality of other subjects. 

[0072] In one embodiment, the first values are determined and/or selected as a 

function of location of activation centers detected by clustering signal changes fiom a 
baseline, wherein the signal changes are independent of information fi^orn any other 
subject. 

[0073] The datastructure can be encoded in machine-accessible rnedia or 
memory. The datastructure can also be transmitted, e.g., as a signal (e.g., a modulated or 
encoded) or other communication, e.g., electronically or digitally. 

[0074] In one aspect, the invention features a method that includes: providing 
(e.g., imaging or receiving) native information about brain fimction of a subject during a 
mental process, the information including quantitative data for signals in at least a 
plurality of regions; comparing signals during the mental process to reference signal 
parameter to locate regions of ^^^^^^ populating a datastriid with inforination 

about signals at least in the regions of activity. The method can provide a systems 

» 

biology map. 
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[007S] In one embodiment, the reference signal parameters is function of a 

baseline for the subject. In another embodiment, the reference signal parameters are a 
function of signals from a population of subjects. 

[0076] In one embodiment, locating regions of activity includes clustering signal 

changes relative to the reference signal parameters. For example, the clustering includes 
defining foci in a three-dimensional coordinate space. In one embodiment, the 
comparing includes generating a statistical map, e.g., as a function of correlation between 
a gamma function and signal changes. The method can include other features described 
herein. 

[0077] In another aspect, the invention features a method that includes: providing 

(e.g., imaging or receiving) datasets (e.g., native or rasterized datasets) about brain 
function for a plurality of subjects during a mental process, the information including 
quantitative data for signals in at least a plurality of regions; combining information from 
the datasets to provide an aggregate dataset; and localizing regions of activity in the 
aggregate dataset. 

[0078] In one embodiment, the combining includes one or more of: transforming 
native datasets to a reference coordinate firame, averaging the native datasets, and 
producing a statistical map. In one embodiment, the localizing includes clustering signal 
changes in the aggregate dataset. 

[0079] In another aspect, the invention features a method that includes: providing 

(e.g., imaging or receiving) native datasets about brain function for a plurality of subjects 
during a mental process, the information including quantitative data for signals in at least 
a plurality of regions; for each subject, producing a first systems biology map from the 
native dataset of the particular subject, wherein the first system biology map is 
independent of the native datasets from the other subjects, and a second systems biology 
map that is a function of regions of activity detected in an. aggregate dataset from the 
plurality of subjects. 

[0080] In another aspect, the invention features a method that includes: providing 
(e.g., imaging or receiving) information about structure and/or function of the brain of the 
subject, the information including quantitative data for at least a plurality of regions; and 
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objectively evaluating the infonnation using quantitative criteria; and providing a 
diagnosis for the subject based on results of the evaluating. For example, the 
quantitative data includes information about brain function during a plurality of mental 
processes. In one embodiment, at least one mental process includes a paradigm, e.g., a 
paradigm that evokes the information backbone for motivation. An objective evaluation 
is typically completely free of the bias or potential bias of a human analyst. Bias may 
still produced by blind or double-blind human analyst, because the analyst is using non- 
quantitative metrics to make a decision. 

[0081] In one embodiment, the evaluating includes comparing the infonnation 

about the subject to a decision tree. For example, the comparing includes evaluating a 
probability of association for the mformation about the subject and one or more terminal 
nodes of the tree. In another example the comparing includes evaluating a probability of 
association for the information about the subject and each bifurcation of the tree. In 
another example, the evaluating includes evaluating a probability that the information 
about the subject is within a classification, wherein the classification is a function of 
quantitative activity measures for a plurality of brain regions. The method can include 
other features described herein. 

[0082] In another aspect, the invention features a method that includes: imaging 
regions of the brain of a subject while at least one of the regions is active to obtain a 
native dataset (e.g., including rasterized image information) that includes information 
about activity in one or more of the regions at a plurality of temporal instances (or 
receiving the native dataset); and condensing the native dataset to provide a condensed 
dataset that includes quantitative information about at least some of the imaged regions. 
In one embodiment, the condensed dataset includes information about one or more 
activity peaks in at least some of the imaged regions. In one embodiment, the condensed 
dataset discards time resolution for at least 10, 20, 30, 50, 70, 80, 90, or 100% of the 
regions. 

[0083] In one embodiment, the regions are imaged by fMRI. In one 

embodiment, the condensing reduces data size at least 10, 10^, 10^ 10"*, 10^ or 10^ fold. 
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[0084] In one embodiment, the condensed dataset includes information that can 
be represented as a matrix, one dimension of which differentiates among regions of the 
brain, (e.g., a region in Table 1). 

[0085] In another aspect, the invention features a method that includes: imaging 
regions of the brain of a subject during a mental process to obtain a dataset (e.g., a native 
dataset) that includes information about brain function; and populating variables in a 
matrix by extracting quantitative information fix)m the dataset In one embodiment, 
wherein the matrix includes at least two dimensions. 

[0086] In one embodiment, the first dimension resolves different regions of the 
brain. In one embodiment, the second dimension resolves the left and right hemisphere 
of the brain. In one embodiment, the matrix includes a third dimension. In one 
embodiment, information about one or more activations in a given region and 
hemisphere are provided at respective variables of the matrix. 

[0087] In one embodiment, the information includes a list, the members of the list 
being stored at different positions along a third dimension of the matrix. In one 
embodiment, the matrix does not provide information about time, e.g., the information 
about the one or more activations is not time-resolved. 

[0088] The imaging can include, e.g., neuroimaging, e.g., tomography, e.g., MRI, 
fMRI, MEG, fCT, 01, SPECT, or PET system. 

[0089] In one embodiment, the provide a systems biology map that includes 
functional information. For example, the systems biology map includes information 
about activity in a plurality of brain regions in at least one mental process, e.g., a 
paradigm, e.g., in at least two, three, four, or five paradigms. In one embodiment, the 
plurality of brain regions includes at least five, ten, twenty, thirty, forty, fifty, or sixty 
brain regions. For example, at least one, ten, twenty, or thirty of the brain regions of the 
' plurality are selected from Table 1. Siibregions or smaller volumes than the exemplary 
regions in Table 1 can also be used, as can regions that are defined by larger volumes and 
encompass one or more of the exemplary regions. 
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[0090] In one embodiment, the infomiation for each of the brain regions is 

independent of reference to a coordinate firame. For example, the brain regions can be 
identified by a numerical index (e;g., an index values for each of a set of predefined 
regions) or by text (e.g., a categorical reference) or aii indirect reference (e.g., use of 
pointers and hyperlinks). In another embodiment, one or more the brain regions can be 
identified by reference to a coordinate fi:ame, e.g., Talairach coordinates. For example, 
however, the information is not indexed voxel by voxel so as not to be in a form of a 
raster, i.e., the information is non-rasterized. 

[0091] In one embodiment, the paradigm interacts with the informational 
backbone for motivation, e.g., it evokes at least one region in the informational backbone 
for motivation. In one embodiment, the paradigm evokes at least one region in the 
informational backbone for motivation. In diie embodiment, the paradigm interacts with 
mechanisms for representation and convergence, feature evaluation, probability 
assessment, outcome processing, valuation, reward/aversion processing, counterfactual 
comparisons, and memory. In one embodiment, the paradigm interacts with inechanisms 
for selection of objectives for fitness, mechanisms for selection of behavior, or 
information processing (e.g., reception). In one embodiment, the paradigm interacts with 
mechanisms for language and symbol processing, mechanisms for communication, 
and/or mechanisms for social behavior. 

[0092] In an embodiment in which there is infonnation for least two paradigms, 

these paradigms may interact with overlapping, but non-coextensive regions of the brain, 
e.g., each paradigm may interact with at least one region that is not activated in another 
paradigm by a normal subject. 

[0093] Exemplary paradigms include: a social reward paradigm, a CPT / 
probability paradigm, a physiological aversion / pain paradigm, a mental rotation 
paradigm, an emotional faces paradigm, and a monetary reward paradigm. Other 
paradigms can also be used. For example, another paradigm which interrogates the 
informational backbone for motivation or other areas described herein, e.g., an area 
interrogated by one of the above paradigms can be used. 
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[0094] In one embodiment, the infoimation about activity for at least one of the 
regions includes deviations from a reference (e.g., percentage differences, ratios, and 
subtractive values). 

[0095] In one embodiment, the systems biology map includes a plurality of 
matrices, each matrix including information about neural activity in a plurality of defined 
brain regions during different paradigms. In another embodiment, the map includes a 
similar or identical set of information, but is stored or represmted in anotiier form, e.g., 
as text, graphic, e.g., as a vector, table, etc. 

[0096] In another aspect, the invention features a method that includes: receiving 
a native dataset that includes imaged iiiformation about brain function of a subject; and 
populating variables in a matrix by extracting quantitative information from the native 
dataset. The method can be used to provide a systems biology map, e.g., as described 
herein. 

[0097] In another aspect, the invention features a method that includes: imaging 
regions of the brain of a plurality of subjects; transforming image information to a 
reference coordinate space; selecting a subset of regions for which activations are 
detected among the plurality of subjects; and producing a condensed dataset for each 
subject of the pluraUty wherein the condensed dataset is smaller than the native dataset 
for each subject of the plurality and retains information about the selected subset of 
regions. In one embodiment, selecting the subset includes averaging the transformed 
image information and evaluating statistically significant changes relative to results of the 
averaging. In one embodiment, selecting the subset includes selecting regions that differ 
from a reference (e.g., a baseline obtained prior or after the mental process). The method 
can include other features described herein. 

[0098] In another aspect, the invention features a method that includes: receiving 
functional information about neural circuit activity, the information being obtained by 
imaging a plurality of brain regions in a subject, and generating a dataset that associates 
each of a plurality of brain regions with quantitative information, wherein the 
quantitative information includes lists of activation peaks (e.g., % signal change) and 
each list is associated with at least one of the brain regions. In one embodiment, the list 
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is rank ordered. In one example, the dataset is represented as a matrix. For example, 
members of each list are positioned or referenced in consecutive cells along one axis of 
the matrix. 

ft 

[0099] In another example, the dataset is represented as a vector or is stored in a 
relational database, e.g., as a table. The method can include other features described 
herein. 

[00100] In another aspect, the invention features method that includes:evaluating a 
subject to produce a furst systems biology map of the subject; treating the subject; and 
evaluating the subject to produce a second systems biology map of the subject;wherein 
the first and second systems biology maps include quantitative information about brain 
function in a plurality of brain regions during at least one mental process, e.g., a 
paradigm. The method can be used, e.g., to evaluate a treatment 

[00101] In one embodiment, treating the subject includes administering an agent to 
the subject. Examples of the agent include a pharmaceutical, a narcotic, an addictive 
substance, or a non-addictive substance. 

[001 02] In one embodiment, treating the subject includes providing a non-invasive 
therapy to the subject. For example, the non-invasive treatment can include hypnosis, 
music, video, visual, superficial contacts, exercise, or physical pressure. 

[00103] The system biology maps can be maps described herein. For example, 
they can include information about activity in a plurality of brain regions in at least one 
mental process, e.g., a paradigm. They can include information about activity in a 
plurality of brain regions in at least two paradigms. 

[00104] In one embodiment, the systems biology map includes structural 
information (e.g., only structural information) 

[00105] In one embodiment, the system 

information. For example, the systems biology map includes information about activity 
in a plurality of brain regions in at least one mental process, e.g., a paradigm, e.g., in at 
least two, three, four, or five paradigms. In one embodiment, the plurality of brain 
regions includes at least five, ten, twenty, thirty, forty, fifty, or sixty brain regions. For 
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example, at least one, ten, twenty, or thirty of the brain regions of the plurality are 
selected from Table 1 . Subregions or smaller volumes than the exemplary regions in 
Table 1 can also be used, as can regions that are defined by larger volumes and 
encompass one or more of the exemplary regions. 

[00106] Li one embodiment, the information for each of the brain regions is 
independent of reference to a coordinate frame. For example, the brain regions can be 
identified by a numerical index (e.g., an index values for each of a set of predefined 
regions) or by text (e.g., a categorical reference) or an indirect reference (e.g., use of 
pointers and hyperlinks), hi another embodiment, one or more the brain regions can be 
identified by reference to a coordinate firame, e.g., Talairach coordinates. For example, 
however, the information is not indexed voxel by Voxel so as not to be in a form of a 
raster, i.e., the information is non-rasterized. 

[00107] In one embodiment, the paradigm interacts with the informational 
backbone for motivation, e.g., it evokes at least one region in the informational backbone 
for motivation. In one embodiment, the paradigm evokes at least one region in the 
informational backbone for motivation. In one embodiment, the paradigm interacts with 
mechanisms for representation and convergence, feature evaluation, probability 
assessment, outcome processing, valuation, reward/aversion processing, counterfactual 
comparisons, and memory. In one embodiment, the paradigm interacts with mechanisms 
for selection of objectives for fitness, mechanisms for selection of behavior, or 
information processing (e.g., reception). In one embodiment, the paradigm int^^ts with 
mechanisms for language and symbol processing, mechanisms for communication, 
and/or mechanisms for social behavior. 

[00108] The systems biology map can include information obtained by imaging, . 
e.g., neuroimaging, e.g., tomography, e.g., MRI, fMRI, MEG, fCT, 01, SPECT, or PET 
system. 

* 

[00109] In an embodiment in which there is information for least two paradigms, 
these paradigms may interact with overlapping, but non-coextensive regions of the brain, 
e.g., each paradigm may interact with at least one region that is not activated in another 
paradigm by a normal subject. 
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[001 10] * Exemplary paradigms include: a social reward paradigm, a CPT / 
probability paradigm, a physiological aversion / pain paradigm, a mental rotation 
paradigm, an emotional faces paradigm, and a monetary reward paradigm. Other 
paradigms can also be used. For example, another paradigm which interrogates the 
informational backbone for motivation or other areas described herein, e.g., an area 
interrogated by one of the above paradigms can be used. 

[001 11] In one embodiment, the infonnation about activity for at least one of the 
regions includes deviations firom a reference (e.g., percentage differences, ratios, and 
subtractive values). 

[0011 2] In one embodiment, the systems biology map mcludes a plurality of 
matrices, each matrix including information about neural activity in a plurality of defined 
brain regions during difiTerent paradigms. In another embodiment, the map includes a 
similar or identical set of infonnation, but is stored or represented in another form, e.g., 
as text, graphic, e.g., as a vector, table, etc. 

[001 13] In another aspect, the invention features a method that includes: providing 
a dataset that includes quantitative information about brain activity during at least two 
paradigms for each of a plurality of subjects; evaluating a parameter that is a continuous 
function of at least two components of the quantitative information, the at least two 
components being associated with different paradigms; and analyzing a statistic for 
association between the parameter and an allele for one or more genetic loci. For 
example, analyzing the statistic can include a Unkage analysis, e.g., non-parametric 
linkage analysis. 

[01 14] In another aspect, the invention features a method that includes: obtaining 
a group of subjects, e.g., human subjects; imaging the CNS of each subject while the 
respective subject is exposed to information (e.g., text, audio (e.g., music, speech), video 
(e.g., advertising) etc); evaluating correlation between a characteristic of neural circuit 
activity of the subjects and alleles present at one or more genetic markers; and providing 
an evaluation of the information as a function between the characteristic and the 
frequency of an allele in a population. The method can include other features described 
herein. 
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[0115] In one exemplary method, subjects (e.g., human patients) are imaged using 

a plurality of procedures to produce tomographic maps. Genmlly, at least two (e.g., at 
least three, four, five, or six) different procedures are used. The plurahty of procedures 
can include functional imaging during one or more paradigms, morphogenetic mapping 
of anatomical features, diffusion tensor analysis for white matter, radiological imaging, f- 
deoxy-glucose scanning, cerebral blood flow, and % cellular viability. 

[01 16] Raw image data are translated into a multi-dimensional quantitative 

''systems biology map" that provides a complex representation of neuropsychiatric 
function. Because multiple procedures are used, the representation can span more than 
one cognitive center. Certain combinations of procedures can produce a nearly* 
continuous map that is a holistic measure of neuropsychiatric function. 

[01 17] The systems biology map (SBM) can be displayed to a user as a matrix or 
may even be rendered on as a three-dimensional image of the brain. More typically, the 
SB map is stored in a database for computational analysis. Data can be analyzed using a 
models, e.g., to assess the reward-aversion circuit, e.g., using behavioral economics 
models. 

[01 1 8] These SB maps have many applications, including, for example, 
evaluating a subject, diagnosing a subject, testing a therapeutic procedure or therapeutic 
compound, monitoring disease progression, monitoring therapy, and so on. 

[0119] The fineness of the map may, for example, separate a general behavior 
perceived as a smgle disease into two or more distinguishable disorders. Further, the 
technique can be applied to non-human animals (e.g., primates and voles) and may be 
used in conjunction with administering a drug, evaluating gene expression, and so forth. 

[0120] The following are some exemplary features: a dataset that includes 
functional tomography for more than one paradigm; a dataset that includes parameters 
representing projperties of more than one behiavioral cifcuif; a multi-^dimehsiohal miatrix 
that is a function of imaged neural activity during a behavior and imaged anatomical 
features, (etc. for other combinations; a muhi-dimensional matrix that is a function of at 
least three different images of the brain. 
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[0121] An exemplary method of correlating a neuropsychiatric trmt with a genetic 
locus may include: obtaining ima^ng infonnation and genetic information from a 
population of individuals; generating a multi-dimensional systems biology (SB) map for 
each individual of the population; quantitatively sort the individuals based on their 
respective "maps", e.g., using an association rule algorithm, thereby identifying a 
subpopulation; comparing polymorphisms at least one genetic locus between individuals 
of the subpopulation to evaluate linkage between a polymorphism and members of the 
subpopulation 

[0122] For example, the comparing can include a genome scan to identify a 

genetic marker with a significant LOD score for the subpopulation. The method can also 
include comparing polymorphisms of individuals excluded from the subpopulation, e.g., 
to detect whether absence of an allele is determinative. Other genetic methods (e.g., 
families, linkage disequilibrium, etc.) can be incorporated. 

[0123] After a genetic polymorphism is associated with a neuropsychiatric trait, a 

bottom up approach is used to evaluate individuals who have the polymorphism. The 
individuals can be evaluated at the extremes of function, and imaged as described above 
to produce a SB map. Typically, the individuals that are evaluated are not members of 
the study that linked the polymorphism to the trait. 

[0124] This approach can have the following applications: provide confirmatory 

information, provide information for construction of a second model of neuropsychiatric 
function, and enable extraipolation of genetic information to a second population of 
individuals. 

V. 

[01 25] The sorting can use criteria for at least two dimensions of the SB map. 

[0126] Many of the methods described herein can be embodied as software, e.g., 
a machine-executable instructions. The software can be stored on a machine-readable or 
accessible medium or as an article, e.g., a commodity. Such methods can also be 
implemented on a machine. Many steps within such methods can be executed, e.g., by 
interaction with a user or automatically. Methods can also be implemented across a 
network, e.g., an intranet or intemet. For example, the network can link a health care 
provider and a patient, a physician (e.g., a radiologist) and a patient, and different 
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physicians (e.g., a radiologist and psychiatrist). Communications between members of 
the network can be secure, web-accessible, and can include hypertext, rotatable images, 
and other interactive and/or cartographic display techniques. 

[0 1 27] Some implementations of the invention enable providing a continuous 

function of disease risk. 

[00128] As used herein the term ''circuit'' refers to identifiable regions of the brain 
that are operational during a function such as a paradigm or other task. Typically such 
regions are distributed in space, but interact with one another. The brain is not modular ' 
but is a distributed system. 

[00129] All cited patents, patent applications, and references are incorporated by 
reference in their entireties. In particular, U.S. published applications 2002-0042563 
(09/822,585) and 09/729,665 are incorporated by reference in their entireties. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[00130] FIG. 1 is a schematic of exemplary interactions between the environment, 
genome, and epigenome. 

[00 1 3 1 ] FlG. 2 is a schematic of an exemplary hierarchical organization that 
generates behavion 

[00132] FIG. 3 is a schematic illustrating some levels of the components shown in 
FIG. 2. 

[0133] FIGs. 4a, 4b, and 4c are exemplary models for cognition. 

[01 34] FIG. 5 is an exemplary model of the informational backbone for 
motivation (iBM) iZO. 

[0135] FIG. 6 is an exemplary combined model 10 for motivation that depicts the 
' interadion between iBM 20 a^^^ behavioral mechanism 40 and a selection 
mechanism 30. 

[0136] FIG. 7 depicts the combined model for motivation 10 and mteractions 
among its components. 
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[0137] FIG.-8 depicts iBM components and mapping of five exemplary 
paradigms onto IBM components using a color code. 

[0138] FIG. 9 depicts aspects of an exemplary inonetary reward paradigm and 
results obtained in particular experiments. 

[0139] FIG. 10a depicts an exemplary qualitative systems biology map. 

[0140] FIGs. 10b and 1 1 highlight some regions that feature in exemplary 
paradigms. 

[0141] FIG. 12 depicts an exemplary qualitative systems biology map, 

[0142] FIG. 13 depicts brain regions which may feature in three putative 
endophenotypes for major depressive disorder. 

[0143] FIG. 14 depicts brain regions which may feature in sonie different 
disorders. 

[0144] FIG. 15 is a schematic of an exemplary phenotype genotype approach. 

[0145] FIG. 16 is a schematic of an exemplary genotype phenotype approach. 

[0146] FIG. 17 is a flow chart of an exemplary method for producing a QSBM 
(quantitative systems biology map). 

[0147] FIG. 18 is a flow chart of an exemplary method for generating a 

classification tree. 

[0148] . FIGs. 19 A, B, and C are exemplary methods for associating genotypic and 
phenotypic information. 

[01 49] FIG. 20 depicts an exemplary set of matrices. 

[0150] FIG. 21 is a schematic of an exemplary system 300. 

[0.1 51]. . FIG.^22 depiQts an.exemplaiy.c^^ . 

[01 52] FIG. 23 is a schematic of an exemplary apparatus. 

[0153] FIG. 24 shows binning across a spectrmn. FIG. 25 are schematics of a 
binary tree. 



• 25- 



Attorney Docket No. 00786-813P01 



DETAILED DESCRIPTION 

f01S41 The Informational Backbone of Motivation (iBM\ 

[01 SS] One central features of the mind is the ^'informational backbone of 
motivation" or "iBM." The IBM is a large domain of the brain that processes information 
for motivation. See, e.g., FIG. 8. The iBM encompasses a nimiber of circuits which 
participate in motivation. The iBM includes a number of component mechanisms, 
including, e.g., mechanisms for representation and convergence, feature evaluation, 
probability assessment, outcome processing, valuation, reward/aversion processing, 
counterfactual comparisons, and memory. Paradigms can trigger one or more of these 
mechanisms, althougji not every paradigm evokes every circuit or structure in the iBM. 

[0156] Other central features of the mind are depicted in FIG. 6. Exemplary 
component circuits include the reward/aversion circuit, working memory, centers of 
language and social behavior. Other components are involved m valuation, outcome 
processing, probability assessment, feature evaluation, representation & convergence, 
reception, counterfactual comparisons, and other behaviors. It is possible to select or 
design paradigms that evoke one or more of these components and evaluate their function 
during the paradigm, e.g., as described for the exemplary paradigms provided herein. 

[0157] The reward/aversion circuit is part of the iBM. The reward/aversion 

circuit allows the organism to assign a value to objects in the environment so as to work 
for "rewards" and avoid "punishments" (aversive outcomes). This circuit can include an 
extended set of subcortical gray matter regions (nucleus accumbens (NAc), caudate, 
putamen, sublenticular extended amygdala (SLEA) , amygdala, hippocampus, 
hypothalamus, and thalamus) and domains of the paralimbic girdle [including the 
orbito&ontal cortex (GOb), insula, cingulate cortex, parahippcampus, and temporal pole] 
that receive dopaminergic projections from the ventral tegmental area and substantia 
nigra, here jointly referred to as the ventral tegmentum: VT). 

[0158] Some additional exemplary regions of the brain that can be imaged and 
described in a systems biology map are provided in Table 1 . 
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Table 1: Exemplary Brain Regions 



I. 


Transverse Cerebral Fissue and 


36. Occipital Pole 


Third Ventricle 


37. Paracingulate Gyrus 


2. 


Ootic Chiasm 


38. Precuneous Cortex 


3. 


Fourth Ventricle 


39. Parahippocampal Gyrus, anterior 


4. 


Brainstem 


division 


5. 


Lateral Ventricles 


40. Parahippocampal Gyrus, posterior 


6. 


Caudate 


division 


7. 


Putamen 


4 1 . Parietal Operculum Cortex 


8. 


Nucleus Accumbens 


42. Postcentral Gyrus 


9. 


Pallidum 


43. Planum Folate 


10. 


Thalamus 


44. Precentral Gyrus 


11. 


Ventral Diencephalon 


45. Planum Temporale 


12. 


Inferior Lateral Ventricles 


46. Subcallosal Cortex 


13. 


Amygdala 


47. Supracalcarine Cortex 


14. 


Hippocampus 


48. Supramarginal Gyrus, anterior 


15. 


Angular Gyrus 


division 


16. 


. Intracalcarine Cortex 


49. Supramarginal Gyrus, posterior 


17. 


Cingulate Gyrus, anterior division 


division 


18. 


Gingulate Gyrus, posterior division 


50, Superior Parietal Lobule 


19. 


Cuneal Cortex 


5 1 . Superior Temporal Gyrus, anterior 


20. 


Central Opercular Cortex 


division • 


21. 


Superior Frontal Gyrus 


52. Superior Temporal Gyrus, posterior l 


22. 


Middle Frontal Gyrus 


division 


23. 


Inferior Frontal Gyrus, pars 


S3. Middle Temporal Gyrus, anterior 


opeicularis 


division 


24. 


Inferior Frontal Gyrus, pars 


54. Middle Temporal Gyrus, posterior 


triangularis 


division 


25. 


Frontal Medial Cortex 


55. Inferior Temporal Gyrus, anterior 


26. 


Frontal Operculuni Cortex 


division 


27. 


Frontal Orbital Cortex 


56. Inferior Temporal Gyrus, posterior 


28. 


Frontal Pole 


division 


29. 


Heschl^s Gyrus (includes HI and 


57. Temporal Fusiform Cortex, anterior 


H2) 




division 


30. 


Insular Cortex 


58. Temporal Fusiform Cortex, 


31. 


Juxtapositional Lobule Cortex 


posterior division 


(formeriy Supplementary Motor Cortex) 


59. Middle temporal Gyrus, 


32. 


Lingual Gyrus 


temporooccipital part 


33. 


Occipital Fusiform Gyrus 


60. Inferior Temporal Gyrus, 


34. 


Lateral Occipital Cortex; inferior 


temporooccipital part 


division 


6 1 . Temporal Occipital Fusiform 


35. 


Lateral Occipital Cortex, superior 


Cortex 


division 


62. Temporal Pole 
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f0159] An Exemplary Schema 

[01 60] Referring now to FIG. 1 , behaviors can be explained by the interaction of 
three major factors, the genome, the epigenome, and the environment. The genome 
refers to the sequence content of nuclear genomic nucleic acid and mitochondrial 
genomic nucleic acid and other resident nucleic acid, such has viral nucleic acid. The 
epigenome refers to interaction of the genome with epigenetic factors that are 
transmissible, but variable modifications such as methylation, chromatin structure, long- 
range chromosomal effects (e.g., position effect variegation, transvection), and even 
RNAi (e.g., endogenous or exogenously added). The epigenome can function as a 
rheostat that reacts to create changes in biological function that are transmissible to a 
subsequent generation. 

[0161] Referring now to FIG. 2, systems biology functions at the interface 
between behavior and the genome and epigenome. The genome and epigenome can 
directly affect cellular functions. At a higher level, groups of cells interact, e.g., as neural 
networks. At a still higher level distributed groups are formed which can control 
behavior, e.g., by reacting to the environment. Although cells are critical component of 
the highest systems biology level, the impact of a single nucleotide in the genome or a 
single epigenetic factor can be only a very small fraction of the complexity of the system. 
Thus, a single genetic or epigenetic change may be difficult detect in the noise of the 
system. 

[0162] Referring now also to FIG. 3, a variety of methods are available to obtain 
information about each level in the systems biology hierarchy. The information can 
include both structural and functional information. For example, distributed neural 
groups can be evaluated by one or more of: magnetic resonance imaging (MRI) (also 
referred to as nuclear magnetic resonance or NMR) and other non-invasive techniques 
such as magnetic resonance spectroscopy (MRS), electroencephgraphy (EEG), 
magnetoencephalography (MEG), positron emission tomography (PET, including labeled 
ligand studies), optical imaging (OR), single photon emission computer tomography 
(SPECT), and functional computerized tomography (fCT), MRI methods include 
functional magnetic resonance imaging (fMRI), which provides information about 
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function of neural groups. The map may include, for example, structural infonnation 
such as morphometric information about anatomical features, diffusion tensor analysis 
for white matter, radiological imaging, f-deoxy-glucose scanning, cerebral blood flow, 
and % cellular viability. 

[0163] Local circuits (e.g., neural groups) can be detected, e.g., by multicellular 

recording (e.g., during surgery of humans or by monitoring non-humans) or even by high 
resolution (or "fme") tomography, for example, by evaluating SO isotropic voxels at 
7 T. Exemplary methods for evaluating cells include: evaluating local field potentials 
(LFPs), e.g., using implanted electrodes; evaluating ion or electrochemical potentials and 
flux (e.g., Ca cascades, e.g., by voltometry); and evaluating gene and protein expression 
(e.g., using microarrays, antibodies, and mass spectroscopy). Methods for evaluating the 
genome and epigenome are described below. Many methods refer generally to genetic 
markers and genetic analysis. Such methods can also include evaluating epigenetic 
features associated with such markers. 

[0164] Information from different levels of the hierarchy can be combined. For 

example, mechanistic explanations can be derived by reductive linkage of descriptions 
across (both up & down) scales. 

[0165] Strategies for Relating Phenotvpe and Genotype 

[0166] Three general strategies can be used to relate genetic markers to a 
phenotype defined by a systems biology map. 

[0167] Referring to the example in FIG. 19 A, the first strategy 1 10 includes first 

classifying 112 subject accordingly to their phenotype, e.g., using information fi-om 
systems biology maps (e.g., QSBMs). The classification process defines a plurality of 
phenotypic classes. Genetic markers are evaluated 1 14 for their association with (e.g., 
within) at least one of the classes. 

[0168] Referring to the example in FIG. 19B, the second strategy 120 includes first 
classifying 122 subject accordingly to their genotype, e.g., using genetic information. 
The classification process defines a plurality of genotypic classes. Then, phenotypes 
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(e.g., information firom systems biology maps) are evaluated to identify associations with 
at least one of the genotypic classes. 

[0169] Referring to the example in FIG. 19C, the third strategy 130 includes 
concurrently classifying subjects by phenotype 112 and classifying them by genotype 
122, By exchanging information during the classification processes, a convergent 
solution can be obtained 134 that associates genotype and phenotype. For example, 
aconvergence of results can be forced between the neuroimaging and the genotypic data. 
This convergence relies on using outcomes from the evalution of the neuroimaging data 
as a set of association rules to prune the partitions found with the data mining of the 
genotypic data. In parallel with this process, the outcome of the evaluation of the 
genotypic data is used to constrain the outcome from the neuroimaging data. 

[0170] Many aspects of the first strategy 1 10 - which involves classification by 

phenotype - have the further advantage of providing diagnostic and prognostic categories 
that are useful even without genetic information or Validated genetic associations. Also, 
this first strategy does not necessarily require extensive family mformation^ linkage 
disequilibrium, founder effects, or requirements on the input population of subjects to 
provide meaningful statistics for finding genes that are associated with a particular 
phenotypic class. 

[0171] Producing an Exemplarv S vstems Bioloev Map 

[0 1 72] Referring to the example in FIG. 8, a plurality of paradigms can be used to 
generate a systems biology map that includes functional information about neural 
circuitry in a subject. 

. [0173]. FIG. 17 provides a flowchart for one exemplary method. A subject is 
evaluated using fMRI during a first paradigm 211 and during a second paradigm 212. 
Methods for evaluating subjects during paradigms are described, e.g., US 2002^0042563 
and below. Raw acquisition data can be mapped 2 14 onto a standard anatomical model, 
e.g., the Talairach coordinates. In other implementations, other types information can be 
used instead of or in conjimction with the first and second paradigms. Such information 
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includes: anatomical and morphological information about brain structure and function, 
clinical information (see, e.g., below). See discussion below. 

[01 74] This information can be condensed 216 to produce a systems biology map, 
e.g., a qualitative or quantitative systems biology map. The abbreviation "QSBM" is 
used to represent quantitative systems biology maps. Although it is possible to use the 
raw data directly, typically the "raw" or "native" dataset acquired by instruments (e.g., an 
MRI machine) during a paradigm is very large. For example, a typically dataset from 
fMRI can include multiple 128x128 sections for 15 to 30 different slices and for about 
300 time points. If multiple runs of the paradigm are done, then the dataset is increased 
that many times. Parcellation and statistical analysis can further increase the dataset 10 
to 15 fold. Thus, it is possible to have at least 20 Mb or even up to 1 terabyte (1 Tb) of 
data for a single subject. However, this information can be processed to generate a 
matrix (e.g., in the kilobyte to 1 Mbyte range) that has a reduced size relative to the 
native dataset, but retains the useful information. Thus, byte-for-byte, information can be 
condensed at least 10, 10^ 10^ 10^ 10^ or 10^ fold, and ranges therebetween. The 
ability to condense information into a meaningful and accessible format may be critical 
for the development and/or analysis of large databases of functional information about 
neural circuitry. 

[0 1 75] Extracting information for a QSBM typically involves discarding data. 
Although compression techniques can be used, e.g. to store the QSBM, typically the 
QSBM is in a form that is easily accessible, e.g., for computation. Because data is 
typically discarded, it is usually not possible to regenerate the native dataset from the 
QSBM. In one embodiment, the QSBM discards time resolution. For example, the 
QSBM can merely retain a list of activations for each region without reference to the 
temporal dimension, although the list may be ordered according to time of occurrence. 

[0176] In one implementation the information is condensed into one or more matrices. ' 
FIG. 20 illustrates a set of matrices. Each matrix includes one dimension (illustrated 
vertically) that refers to different regions of the brain (e.g.* regions 1, 2, 3, ... n, wherein 
n refers to the n* region) and another dimension (horizontal) that refers to left and right 
hemispheres of the brain. A third dimension (e.g., going into the page) is used to store a 
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list of values for particular region-hemisphere. For fMRI data, for example, the list of 
values may refer to % change of each of the activation peaks detected during a paradigm. 
For example, if a region has three different activation peaks that are due to the following 
changes in signal, 2.3%, -0.5%, and 1.2%, the sequence of values in the third dimension 
' can be {2.3%, 1,2%, -05%}. 

[01 77] Other values may also be used (e.g., instead of or in conjunction with % 
change), e.g., time to peak for each of the activation peaks detected during a paradigm, 
delay, FT, and slope. In another embodiment, information about each activation peak can 
further include information indicating location of the peak, e.g., where within the region 
the activation occurred and/or time, e.g., a reference to the temporal dimension. 

[01 78] An additional matrix can be used to store clinical (e.g., diagnostic) and 
demographic information such as age, gender, handedness, EEG, drug regimen (e.g., 
pharmacology), narcotic dependency, pedigree information, place of birth, place of 
residence, socio-economic status, race, language (e.g., ability to speak, understand a 
particular language and/or exposure to language, e.g., as an infant, child, adult), 
WAIB-R, DRM-IV diagnosis, and so forth. Still other types of useful information 
include quantitative medical assessments, e.g., blood pressure, piilse, body temperature, 
blood cell count, circadian rhythm, height, height, and other biometric values. 

[0179] Another further matrix can be used to store genotypic information, 
although such information can also be stored separately. This additional matrix may 
only be two-dimensional. 

[01 80] It is appreciated that a matrix can also be represented using other formats 
(e.g., an n-dimensional vector) or transformed into other representations (e.g., one or 
more tables in a relational database, a text string, and so forth). A set of matrices can also 
be represented as a single matrix which has an additional dimension relative to the most 
complex matrix in. the set. 

[0181] FIG. 9 describes an exemplary qualitative systems biology map. The map 
describes different regions, here, the Gob, NAc, SLEA, Amygdala, and Vt. The map also 
indicates activity of the regions during the expectancy phase of cocaine and monetary 
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reward paradigms, and the outcome phase of cocaine, monetary reward, beauty and pain 
paradigms. 

f0182) Phenotvpic Classifications 

[0183] Referring now to the exemplary method 230 in FIG. 18, information about 
a plurality of subjects (e.g., human subjects) can be used to produce phenotypic 
classifications. The method 230 includes randomly selecting 232 a test set and training 
set from the plurality of subjects. For example, if a database includes information about 
1000 subjects, 500 might be used for the training set and the other 500 might be used for 
the test set Variables for phenotypic information are then analyzed by evaluating 232 
autocorrelations between the variables in the training set. The variables that are analyzed 
might include, e.g., all variables related to functional and structural information about 
neural circuitry. In other some embodiments, it may be useful to exclude the diagnostic 
and demographic information from the autocorrelation analysis, although in other 
embodiments such information may be included. 

[01 84] The results of the autocorrelations are then used to select variables to build 
236 a classification tree. For example, the variable with the best autocorrelation score 
can be used to define a rule for the first node of the tree. The variable with the second 
best score may be used to define a rule for the second node of the tree. Details of tree 
building are provided below. Once the tree is complete, the classification is used to 
evaluate the test set. 

[01 85] Objective scoring can also be used to evaluating the tree. For example, 
the sizes of the clusters in the test set and the training set should be reasonably similar, 
e.g., within statistically acceptable values. In one embodiment, the tree should have a 
Statistical significance, e.g., the tree structure is not attributed to chance alone. 

[0186] In some implementations, the classification method may achieve one or 

more of the following advantages: (i) classifications are obtained by completely objective 
criteria, (ii) structural and functional information can be easily integrated as can other 
variables, e.g., information from different imaging techniques, different times and 
different subjects, (iii) classification is scalable and expandable (e.g., as the number of 
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available subjects grows), (iv) information is condensed relative to raw acquisition data 
or data transformed onto an anatomical model. 

[01 87] The use of autocorrelations enables one objective approach to selecting 
variables for tree classification. Variables that provide high autocorrelation scores are 
indicated as being highly informative. However, in some implementations, this objective 
approach is combined with a subjective approach, or if desired^ in some implementations, 
a completely subjective approach can be used. In an exemplary subjective approach, 
regions of the brain that are known to connect and interact with regions that score high by 
objective criteria are also used for tree classification, e.g., independent of their own 
autocorrelation score. Regions that are known to be involved or be featured in a 
particular process may also be selected. 

[0188] Tree editing can include pruning branches, e.g., particularly branches that 
do not segregating individuals in an informative manner. For example, segregating a 
single individual from a group of twenty does not aid the classification process. 
Similarly segregating in a upper node, five subjects from a group of five hundred may not 
infonm the classification process. Pruning can be performed manually or automatically. 
In one example, associations rules are used to test the salience of possible correlations 
and to prune off non-informative nodes. In another example of automated pruning, 
branches with asymmetric distributions (e.g., < 10% into one branch) are removed by 
computer software. 

[01 89] The classification process can also be evaluated (e.g., by a user or 
software) to determine if it provides explanatory power. For example, a classification 
can be evaluated to determine if it bins known exophenotypes (e.g., clinical diagnoses) 
into subclasses. In another example, a classification is evaluated to determine if there is 
familiality, e.g., whether the classification identifies an endophenotype (see, e.g., below). 
In still another example, the classification continues until one or more particular 
constraints are satisfied. 

[01 90] It is also possible to do a recursive process wherein tree branches are 
added and pruned during multiple recursive cycles until the tree structure satisfies 
particular parameters, e.g., optimization parameters. For example, the tree can be 
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modified uiitil a cost value for growing the tree exceeds the informational value of the 
added complexity. 

[0191] In other embodiments, the training and test sets may be different sizes. Or 
in one embodiment^ all available data is used to generate the classification tree. 

rQ192] Endophenotvoes 

[01 93] By evaluating infomiation from a plurality of subject it is possible to 
identify at least two types of phenotypic markers: endophenotypes and markers of 
disease/disorder progression (MDP). 

[0194] An endophenotype typically includes the following properties: a) it 
provides an internal marker of a probability function for disease susceptibility or 
resistance; (b) it is unchanged by illness progression; and (c) it has measurable 
heritability / familiality. See, e.g., Ahnasy and Blanquero (2001) Am. J. Med. Genet 
108:42. Thus, endophenotypes may be found (but not necessarily) in unaffected siblings 

4 

and parents of a subject who is affected by a disorder. Similarly, the endophenotype can 
be present prior to onset of the disorder. Thus, endophenotypes have high diagnostic 
value. An endophenotype may be defined by one or more variables, e.g., one or more 
variables present in a SBM (e.g., a QSBM) described herein. 

[0195] In contrast, a marker of disease/disorder progression (MDP) is changed 
during the progression of a disorder. Such markers can be used to characterize the 
disorder, prescribe or monitor a treatment, and make other decisions (e.g., medical or 
fmancial decisions). 

[0196] A method for evaluating neuiropsychiatric phenotypes can include a 

longitudinal component which is of great value in differentiating between 
endophenotypes and MDPs. Such longitudinal studies include analyzing a subject at a 

, first time and then analyzing.tfie.subject at .a later time, e.g., at least one week, one,, two, 

three, four, six, ten, or twelve months later. For example, the subject might be analyzed 

ft 

once a year over three to five years. In some embodiments, the subject is evaluated at 
approximately regular intervals. During these studies phenotypic Variables that remain 
unchanged, but which differ from normal (e.g., which are identified as useful for 
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classification) are variables that can serve as endophenotypes. If the subject's outward 
clinical manifestations of a disorder are changing, other variables detected by evaluating 
neural circuit function may also change. Such variables can server as MDP. 

[01971 An integrated Svstem 

[0198] Referring to FIG. 21, an exemplary integrated system 300 can be used to 
produce information for a database and generate mfonnation about neural circuit activity. 
For example, the system can include a network 305 that connects one or more imagers 
350 (e.g., MRI machines) and one or more genotyping stations 340 with a database server 
320. The imagers 350 can deliver raw or processed information to the server 320 with 
information that references an individual (e.g., using an anonymous index). The database 
server 320 also receives similarly referenced information about the individual's genotype 
so that there is an association between the genotypic informatioii and the phenotypic 
information obtained by MRI. For example, a datastructure can be used that includes a 
first field with a pointer to the genotypic information of the individual and a second field 
with a pointer to the phenotypic information for the same individual. 

* 

[0199] In one enibodiment, the system 300. also includes a statistics engine which 
caii evaluate the phenotypic information and/or geiiotypic information, e.g., using a 
method described herein. 

[0200] The methods and other features described herein can be implemented in 

digital electronic circuitry, or in computer hardware, firmware, software, or in 
combinations thereof Methods can be implemented using a computer program product 
tangibly embodied in a machine-readable storage device for execution by a 
programmable processor; and method actions can be performed by a progranunable 
processor executing a prograni of instructions to perform functions of the invention by 
operating on input data and generating output. For example, methods can be 
implemented advantageously in one oi: more comjputef programs that 'are~ executabre on a 
programmable system including at least one programmable processor coupled to receive 
data and instructions firom, and to transmit data and instructions to, a data storage system, 
at least one input device, and at least one output device. Each computer program can be 
implemented in a high-level procedural or object oriented programming language, or in 
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assembly or machine language if desired; and in any case, the language can be a 
compiled or interpreted language. Suitable processors include, by way of example, both 
general and special purpose microprocessors. A processor can receive instructions and 
data firom a read-only memory and/or a random access memory. Generally, a computer 
will include one or more mass storage devices for storing data files; such devices include 
magnetic disks, such as internal hard disks and removable disks; magneto-optical disks; 
and optical disks. Storage devices suitable for tangibly embodying computer program 
instructions and data include all forms of non-volatile memory, including, by way of 
example, semiconductor meniory devices, such as EPROM, EEPROM, and flash memory 
devices; magnetic disks such as, internal hard disks and removable disks; magneto- 
optical disks; and CD-ROM disks. Any of the foregoing can be supplemented by, or 
incorporated in, ASICs (application-specific integrated circuits). 

[0201] Data structures, trees, databases, and other information formats described 
herein can be stored in a machine accessible memory (e.g., volatile or non^ volatile 

I 

memory, within a CPU or external to a CPU) or on machine-readable medium (e.g., a 
hard disk, CD-ROM, and so forth. 

[0202] An example of one such type of coinputer is depicted in FIG. 22, which 
shows a block diagram of a progranmiable processing system (system) 5 10 suitable for 
unplementing or performing the apparatus or methods of the invention. The system 510 

■ 

includes a processor S20, a random access memory (RAM) 521, a program memory 522 
(for example, a writable read-only memory (ROM) such as a Hash ROM), a hard drive 
controller 523, and ah input/output (I/O) controller 524 coupled by a processor (CPU) bus 
525. The system 510 can be preprogrammed, in ROM, for example, or it can be 
progranuned (and reprogrammed) by loading a program from another source (for 
example, from a floppy disk, a CD-ROM, or another computer). 

[0203] The hard drive controller 523 is coupled to a hard disk 530 suitable for 

Storing executable computer programs, including programs embodying the present 
invention, and data including storage. The I/O controller 524 is coupled by means of an 
I/O bus 526 to an I/O interface 527. The I/O interface 52? receives and transmits data in 
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analog or digital form over communication links such as a serial link, local area network, 
wireless link, and parallel link. 

[0204] One non-limiting example of an execution environment includes 
computers running Linux Red Hat OS, Windows XP (Microsoft), Windows NT 4.0 
(Microsoft) or better or Solaris 2.6 or better (Sun Microsystems) operating systems. 
Browsers can be Microsoft Internet Explorer version 4.0 or greater or Netscape Navigator 
or Communicator version 4.0 or greater. Computers for databases and administration 
servers can include Windows NT 4.0 with a 400 MHz Pentium II (Intel) processor or 
equivalent using 256 MB memory and 9 GB SCSI drive. For example, a Solaris 2.6 
Ultra 10 (400Mhz) with 256 MB memory and 9 GB SCSI drive can be used. Other 
environments can also be used. 

r020S1 Diagnosis 

[0206] In one embodiment, a tree classification is produced based on information 
from a plurality of siibjects. This tree can be used directly for diagnosing a subject (the 
"query subject"), particularly a subject that is not a member of the pliurality of subjects 
that was used to produce the tree. Infomiatioii for the query subject can be run through 
the tree. 

[0207] For example, if a native dataset for the query subject is received, the 
native dataset can be processed to produce a QSBM that has the same structure as the 
maps used for producing the tree. The query subject's QSBM is then compared to rules 
at each node of the tree to determine where the query subject falls on the tree. By 
proceeding down tfie tree to a terminal node, this process should indicate which bin or 
class the query subject belongs iii. If the tree includes a probabiUstic or other statistical 
function that corresponds to the decision at each node, this process can also produce a 
probability or statistical significance for the diagnosis. For example, it is possible to 
display a valiie for each of the possible bins of classes that indicates the probability that 
the query subject belongs in that bin or class. (The probabilities should sum to 1.0). 
However, it may not be necessary to explore all the branches of the tree. For example, 
only branches likely to be relevant might be tested. 

« 
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[0208] In another embodiment, a rules-based function is used to define a class. 
Information about the query subject is then compared to one or more rules to produce an 
evaluation indicating whether the query subject belongs in the class. The result of the 
evaluation might again be a probability or other statistic. In this embodiment, it is not 
necessary to sequentially process a set of rules. 

[0209] . Exemplary Applications 

[0210] There are numerous applications for the methods, data-structures, and 
systems described herein. In one example, the methods can be used to characterize (e.g., 
diagnosis) a neuropsychiatric disorder or a propensity or association with a disorder. In 
another example, the methods can be used for the discovery of a gene or epigenetic factor 
which contributes at least in part to a neuropsychiatric disorder. Such disorders include, 
e.g., schizophrenia, manic depression, bipolar disorder, addictions (e.g., substance abuse, 
gambling, etc.)) obsessive-compulsive disorder, anxiety/paranoia, autism, schizo- 
affective disorder; delusional disorder, psychotic disorders not elsewhere specified; 
antisocial personality disorder, anorexia/bulimia nervosa; and so on. Similarly socially 
valued traits can also be evaluated, e.g., in individuals gifted with musical talent, charm, 
charisma, mathematical ability, persuasion, determination, creativity, and so forth. Once 
a gene or epigenetic factor is discovered it can be used a target for identifying, testing, or 
designing pharmacological interventions. 

[02 1 1 ] Another exemplary application provides a database, which can be used, 
e.g., to diagnosis, evaluate, and process clinical or commercial information. The methods 
described herein can diagnose functional brain disorder using multiple quantitative 
variables (e.g., by oversampling information space. 

[0212] Still other applications include staging clinical diagnosis of 

neuropsychiatric in terms of functional impainnent caused by non-pschiatric illness or to 
stage a psychiatric'illness; detecting non-clinical variants that may appe^ as clinical 
disorders; evaluating and planning treatment of psychiatric illness; monitoring and 
evaluating treatment efiScacy; intervening in narcotics abuse; and monitoring narcotic 
consumption. 
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(0213) Methods of Evaluating Genetic Material 

[02 1 4] There are numerous methods for evaluating genetic material to provide 
genetic information. Genetic information can be obtained by evaluating a subject or a 
sample from a subject. The sample typically includes nucleated cells, e.g., somatic cells, 
or nucleic acid extracted from such cells (e.g., genomic DNA or cDNA or mRNA). In 
embodiments in which genomic DNA is used, virtually any biological sample (other than 
pure red blood cells) is suitable. For example, convenient tissue samples include whole 
blood, semen, saliva, tears, urine, fecal material, sweat, buccal, skin and hair. ]n 
embodiments in which cDNA or mRNA is used, the tissue sample usually includes cells 
in which the target nucleic acid is expressed. 

[021 5] Nucleic acid samples can analyzed using biophysical techniques (e.g., 
hybridization, electrophoresis, and so forth), sequencing, enzyme-based techniques, and 
combinations-thereof. 

[0216] For example, hybridization to microarrays can also be used to detect poly- 
morphisms, including SNPs. In one implementation, a set of different oligonucleotides, 
with the polymorphic nucleotide at varying positions with the oligonucleotides can be 
positioned on a nucleic acid array. The extent of hybridization as a function of positioii 
and hybridization to oligonucleotides specific for the other allele can be used to 
determine whether a particular polymorphism is present See, e.g., U.S. 6,066,454. 

[02 1 7] In one implementation, hybridization probes can include one or niore 
additional mismatches to destabilize duplex formation and sensitize the assay. The 
mismatch may be directly adjacent to the query position, or within 10, 7, 5, 4, 3, or 2 
nucleotides of the query position. Hybridization probes can also be selected to have a 
particular Tm, e.g., between 45-60^C, 55-65X, or 60-75^C. In a multiplex assay, Tn,'s 
can be selected to be within 5, 3, or 2^C of each other, e.g., probes for a genetic marker 
can be selected with these criteria. 

[0218] U.S. Pat. No. 5,837,832 describes a tiling method for array fabrication 

whereby probes are synthesized on a solid support. These arrays include a set of 
oligonucleotide probes such that, for each base in a specific reference sequence, the set 
includes a first probe (for example, a so-called "wild-type" or "WT" probe) that is exactly 
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complementary to a section of the sequence of the chosen fragment including the base of 
interest in a first allele and at least one additional probes (called "substitution probe"), 
which are identical to the WT probe except that the base of interest has been replaced by 
one of a predetermined set of nucleotides (typically, one, two or three nucleotides), i.e., 
nucleotides other than the nucleotide in the first probe, for example a nucleotide 
complementary to a second allele. Probes may be synthesized to query each base in the 
sequence of the chosen fragment or a particular base known to be polymorphic. Target 
nucleic acid sequences which hybridize to a probe on the array which contain a 
substitution probe indicate the presence of a single nucleotide polymorphism. See also, 
e.g.,U.S. 5,858,659; 5,861,242; 5,593,839 and 5,856,101 (describing, e.g., variously 
inethods of using computers to design arrays and lithographic masks and methods of 
detecting insertions and deletions). 

[0219] The design and use of allele^specific probes for analyzing polymorphisms 
is described by e.g., Saiki et al., Nature 324, 163-166 (1986); Dattagupta, EP 235,726, 
Saiki, WO 89/1 1548. Allele-specific probes can be designed that hybridize to a segment 
of target DNA from one indi vidual but do not hybridize to the conesponding segment 
from another individual due to the presence of different polymorphic forins in the 
respective segments from the two individuals. Hybridization conditions should be 
sufficiently stringent that there is a significant difference in hybridization intensity 
between alleles, and preferably an essentially binary response, whereby a probe 
hybridizes to only one of the alleles. In one embodiment, probes are designed to 
hybridize to a segment of target DNA such that the polymorphic site aligns with a central 
position (e.g., in a 15-mer at the 7 position; in a 16-mer, at either the 8 or 9 positioii) of 
the probe. This design of probe achieves good discrimination in hybridization between 
different allelic forms, in one embodiment, the probes include a second mismatch which 
is non-complementary to both alleles of a biallelic pain The second mismatch serves to 
destabilize the duplex, reduce Tm, and increase sensitivity. 

[0220] Allele-specific probes are often used in pairs, one member of a pair 
showing a perfect match to a reference fonn of a target sequence and the other member 
showing a perfect match to a variant form. Several pairs of probes can then be 
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immobilized on the same support for simultaneous analysis of multiple polymorphisms 
within the same target sequence. 

[022 1 ] Other hybridization based techniques include sequence specific primer 
binding (e.g., PGR or LCR); Southern analysis of DNA, e.g., genomic DNA; Northern 
analysis of RNA, e.g., mRNA; fluorescent probe based techniques (see, e.g., Beaudet et 
ai (2001) Genome Res, 1 1(4):600-8); and allele specific amplification. Enzymatic 
techniques include restriction enzyme digestion; sequencing; and single base extension 
(SBE). These and other techniques are well known to those skilled in the art. 

[0222] Electrophoretic techniques include capillary electrophoresis and Single- 
Strand Conformation Polymorphism (SSCP) detection (see, e.g., Myers et al (1985) 
Nature 313:495-8 and Ganguly (2002) Hum Mutat, 19(4):334-42). Other biophysical 
methods include denaturing high pressure liquid chromatogr2q)hy (DHPLC). For 
example, different alleles can be identified based on the different sequence-dependent 
melting properties and electrophoretic migration of DNA in solution. Erlich, ed., PGR 
Technology, Principles and Applications for DNA Amplification, (W.H. Freeman and 
Co, New York, 1992), Chapter 7. Alleles of target sequences can also be differentiated 

ft 

using single-strand conformation polymorphism analysis, which identifies base 

differences by alteration in electrophoretic migration of single stranded PCR products, as 

described in Orita et al., Proc. Nat. Acad. Sci. 86, 2766-2770 (1989). Amplified PCR 

products can be generated as described above, and heated or otherwise denatured, to form 

single stranded amplification products. Single-stranded nucleic acids may refold or form / 

secondary structures which are partially dependent on the base sequence. The different 

electrophoretic mobilities of single-stranded amplification products can be related to 

base-sequence differences between alleles of target sequences. 

[0223] In one embodiment, allele specific amplification technology that depends 
on selective PCR amplification may be used to obtain genetic information. 
Oligonucleotides used as primers for specific amplification may carry the mutation of 
interest in the center of the molecule (so that amplification depends on differential 
hybridization) (Gibbs et al (1989) Nucleic Acids Res, 17:2437-2448) or at the extreme 3' 
end of one primer where, under appropriate conditions, mismatch can prevent, or reduce 
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polymerase extension (Prossner (1993) Tibiech 1 1 :238). See also, e.g., WO 93/22456. 
In one embodiment, the allele specific primer is used in conjunction with a second primer 
which hybridizes at a distal site. Amplification proceeds from the two primers, resulting 
in a detectable product which indicates the particular allelic form is present. A control is 
usually performed with a second pair of primers, one of which shows a single base 
mismatch at the polymorphic site and the other of which exhibits perfect 
complementarity to a distal site. 

[0224] In addition, it is possible to introduce a restriction site in the region of the 

mutation to create cleavage-based detection (Gasparini et al (1992) Mol Cell Probes 
6:1). hi another embodiment, amplification can be performed using Taq ligase for 
amplification (Barany (1991) Proc, Natl. Acad. Sci USA 88:189). Li such cases, ligation 
will occur only if there is a perfect match at the 3' end of the 5* sequence making it 
possible to detect the presence of a known mutation at a specific site by looking for the 
presence or absence of amplification. 

[0225] Enzymatic methods for detecting sequences include amplification based- 
methods such a the polymerase chain reaction (PGR; Saiki, et al. (1985) Science 230, 
1350-1354) and ligase chain reaction (LCR; Wu. et al (1989) Genomics 4, 560-569; 
BarringercM/. (1990), Ge/ie 1989, 117-122; F. Barany. 1991, Proc. Natl Acad. Sci. 
USA 1988, 189-193); transcription-based methods utilize RNA synthesis by RNA 
polymerases to amplify nucleic acid (U.S. Pat. No. 6,066,457; U.S. Pat. No. 6,132,997; 
U.S. Pat. No. 5,716,785; Sarkar et al, Science (1989) 244:331-34; Stofler et al. Science 
(1988) 239:491); NASBA (U.S. Patent Nos. 5,130,238; 5,409.818; and 5,554,517); 
rolling circle amplification (RGA; U.S. Patent Nos. 5,854,033 and 6,143,495) and strand 
displacement amplification (SDA; U.S. Patent Nos. 5,455,166 and 5^624,825). 
Amphfication methods can be used in combination with other techniques. 

[0226] Other enzymatic techniques include sequencing using polymerases, e.g., 

' •• ■ • • • - . ... 

DNA polymerases and variations thereof such as single base extension technology. See, 
e.g., U.S. 6,294,336; U.S. 6,013,431; and U.S. 5,952,174. For example, Ghen et al, 
(PNAS 94:10756-61 (1997)), describes a locus-specific oligonucleotide primer labeled on 
the 5* terminus with 5-carboxyfluorescein (FAM). This labeled primer is designed so that 
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the 3' end is immediately adjacent to the polymorphic site of interest. The labeled primer 
is hybridized to the locus^ and single base extension of the labeled primer is performed 
with fluorescently-labeled dideoxyribonucleotides (ddNTPs) in dye-terminator 
sequencing fashion. An increase in fluorescence of the added ddNTP in response to 
excitation at the wavelength of the labeled primer is used to infer the identity of the added 
nucleotide. ' 

> 

[0227] Another method to identify SNPs is called single nucleotide primer 

extension (SnuPE) or minisequencing (Nikiforov et al, Nucleic Acids Res., 22: 4167-75 
(1994); Pastinen et al, Clin. Chem., 42: 1391-17 (1996); Landegren et al.. Genome Res,, 
8: 769-76 (1998); Kuppuswamy et al., Proc. Natl. Acad. Sci, U.S.A,, 88: 1 143-7 (1991)). 
This technique involves the hybridization of a primer inmiediately adjacent to the 
polymorphic locus, extension by a single dideoxynucleotide, and identification of the 
extended primer. All variable nucleotides can be identified with optimal discrimination 
using the same reaction conditions, pastinen et al.. Genome Res., 7: 606^14 (1997)). 
Related detection methods include luminous detection (Nyreii et al.. Anal. Biochem., 
208: 171-5 (1993)), colorimetric ELISA (Nikiforov et al, Nucleic Acids Res., 22: 4167- 
75 (1994)), gel-based fluorescent assays (Pastinen et al, Clin. Chem., 42: 1391-7 (1996)), 
homogeneous fluorescent detection (Chen et al. Genet. Anal, 14: 157^63 (1999)), flow 
cytometry-based assaiys (Cai et al. Genomics, 66: 135-43 (2000)), and high performance 
liquid chromatography (HPLC) analysis (Hoogendoom et al., Hum. Genet, 10*4: 89-93 
(1999)). . 

[0228] Mass spectroscopy (e.g., matrix assisted laser desorption ionization-time 

of flight (MALDI-TOF) mass spectroscopy) can be used to detect nucleic acid 
polymorphisms. In one embodiment, (e.g., the MassEXTEND™ assay, SEQUENOM, 
Inc.), selected nucleotide mixtures, missing at least one dNTP and including a single 
ddNTP is used to extend a primer that hybridizes near a polymorphism. The nucleotide 
ihixture is selected so that the exten$ion products between flie different polymbrphisms at 
the site create the greatest difference in molecular size. The extension reaction is placed 
on a plate for mass spectroscopy analysis: See, e.g., Haff et al, Genome Res., 7: 378-88 
(1997); GrifFin et al. Trends Biotechnol, 18: 77-84 (2000); Sauer et al. Nucleic Acids 
Res., 28: E13 (2000)). 

-44. 



Attorney Docket No. 00786-813P01 

[0229] Fluorescence based detection can also be used to detect nucleic acid 
polymorphisms. For example, different terminator ddNtPs can be labeled with different 
fluorescent dyes. A primer can be annealed near or immediately adjacent to a 
polymorphism, and the nucleotide at the polymorphic site can be detected by the type 
(e.g., '"color") of the fluorescent dye that is incorporated. 

[0230] It is also possible to directly sequence the nucleic acid for a particular 
genetic locus, e.g., by amplification and sequencing, or amplification, cloning and 
sequence. The direct analysis of the sequence can be accomplished, e.g., using either the 
dideoxy chain termination method or the Maxam-Gilbert method (see Sambrook et al.^ 
Molecular Cloning, A Laboratory Manual (2nd Ed., CSHP, New York 1989); Zyskind et 
al., Recombinant DNA Laboratory Manual, (Acad. Press, 1988)). High throughput 
automated (e.g., capillary or microchip based) sequencing apparati can be used. In still 
other embodiments, the sequence of a protein of interest is analyzed to infer its genetic 
sequence. Methods of analyzing a protein sequence include protein sequencing, niass 
spectroscopy, sequence/epitope specific immunoglobulins, and protease digestion. 

[023 1 ] Any combination of the above methods caii also be used. For example, 

allele specific technology can be used in combination with microarrays. See, e.g., 
U.S. 6,287,778. 

[0232] Exemplary genetic markers (e.g., polymorphisms) can be foxmd from 

publicly available resources. Such resources include: the Whitehead Institute's integrated 
maps of the human genome (e.g., the WICGR map, Cambridge MA) which provide 
aligned chromosome maps of genetic markers; other sequence tagged sites (STSs); 

■ 

radiation hybrid map data; CEPH yeast artificial chromosome (YAC) clones; the Genetic 
Annotation Initiative (web site: cgap.nci.nih.gov/GAI/; an NIH run site which contains 
information on candidate SNPs); dbSNP Polymorphism Repository (world wide web site: 
ncbi.nhn.nih.gpv/SNP/; a comprehensive NIH-run database containing information on 
SNPs and also haplotypes); HUGO Mutation Database Initiative (web site: 
ariel.ucs.unimelb.edu.au:80/.about.cottbn/mdi.htm; a database with information about 
human mutations including SNPs); Human SNP Database (world wide web site: - 
genome.wi.mit.edu/SNP/human/index.htral; managed by the Whitehead Institute for 
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Biomedical Research Genome Institute, this site contains information about SNPs); SNPs 
in the Human-Genome SNP database (world wide web site: ibc.wustledu/SNP; providing 
access to SNPs that have been organized by chromosomes and cytogenetic location 
firomWashington University); HGBase (web site: hgbase.cgr.ke.se/; a summary of 
sequence variations in the human genome from the Karolinska Institute of Sweden); the 
SNP Consortium Database (web site: snp.cshl.org/db/snp/map; a collection of SNPs and 
related information resulting from a collaborative effort); GeneSNPs (world wide web 
site: genome.utah.edu/genesnps/; from the University of Utah and U.S. National Institute 
of Environmental Health). Many exemplary biallelic markers are also described in 
publications; see. e.g., U.S. Serial No. 60/206,615, U.S. Serial No. 60/216,745, WIPO 
Serial No. PCT/IBOO/00184, WIPO Serial No. PCT/B98/01193, PCT Publication No. 
WO 99/54500, and WIPO Serial No. PCT/IBOO/00403, US 2002-0037508 and US 2002- 
0032319. 

[0233] The following are some examples of types of polymorphisms: A 
restriction fragment length polymorphism (RFLP) is a variation in DNA sequence that 
alters the length of a restriction fragment (Botstein et al, Am. J Hum. Genet. 32, 314-331 
(1980)). The restriction fragment length polymorphism may create or delete a restriction 
site, thus changing the length of the restriction fragment. RFLPs have been widely used 
in human and animal genetic analyses (see WO 90/13668; W09b/1 1369; Donis-Keller, 
Cell 51, 319-337 (1987); Lander et al, Genetics 121, 85-99 (1989)), Other 
polymorphisms take the form of short tandem repeats (STRs) that include tandem di-, tri- 
and tetra-nucleotide repeated motifs. These tandem repeats are also referred to as variable 
number tandem repeat (VNTR) polymorphisms. VNTRs have been used in identity and 
paternity analysis (US 5,075,217; Amour et al, FEBS Lett. 307, 1 13-115 (1992); Horn 
et al., WO 91/14003; Jeffreys, EP 370,719), and in a large number of genetic mapping 
studies. 

[0234] Other polymorphisms take the form of single nucleotide variations 

between individuals of the same species. Such polymorphisms are far more frequent than 
RFLPs, STRs and VNTRs. Some single nucleotide polymorphisms (SNP) occur in 
protein-coding nucleic acid sequences (coding sequence SNP (cSNP)), in which case, one 
of the polymorphic forms may give rise to the expression of a defective or otherwise 
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variant protein and, potentially, a genetic disease. Examples of genes in which 
polymorphisms within coding sequences give rise to genetic disease include (S-globin 
(sickle cell anemia), apo£4 (Alzheimer's Disease), Factor V Leiden (thrombosis), and 
CFTR (cystic fibrosis). cSNPs can alter the codon sequence of the gene and therefore 
specify an alternative ammo acid. Such changes are called "missense" when another 
amino acid is substituted, and "nonsense" when the alternative codon specifies a stop 
signal in protein translation. When the cSNP does not alter the amino acid specified the 
cSNP is called "silent". Other single nucleotide polymorphisms occur in noncoding 
regions. Some of these polymorphisms may also result in defective protein expression 
(e.g., as a resuU of defective splicing). Other single nucleotide polymorphisms have no 
effect, e.g., no phenotypic effect. 

[0235] ■ Pharmacology and pharmacogenomics 

• ■ 

[0236] It is also possible to use the methods described herein to evaluate 
phenotypes (e.g., by imaging) of a subject undergoing a treatment. Differences in 
phenotype can be detected by classification (e.g., classification trees). Then associations 
with a particular genotype can be detected. Other strategies (e^g., in FIG. 19) can also be 
applied, e.g., in combination with the data analysis methods and data striictures described 
herein. Exemplary treatments include admmistering an agent (e.g., a medicament) and 
non-invasive treatments (e.g., hyponosis, psychotherapy, etc.). Homeopathic and 
traditional medicines as well as social behaviors can be similarly analyzed. 

[0237] in one embodiment, recursive partitioning is used in a study to do 
pharmacogenetics, e.g., using subjects undergoing a treatment (e.g., medication or a non- 
invasive therapy). Classification trees can be used to determine if subjects respond 
dijfferently to a treatmeiit. Or the classification caii be done blind - e.g., evaluate treated 
subjects and controls to detect if significant classifications are objectively made that 
• discriminate between treated and untreated subjects (e.g., humans^^^^ 
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4 

r02381 Imaging 

[0239] An exemplary method for imaging a subject can include positioning 

subjects to be tested (e.g. persons who are under going a paradigm) and instructing the 
subjects to remain as still as possible, information about the brain is acquired. A 
measuring apparatus which non-invasively obtains information about the brain (e.g., 
structure and/or function) is used. In one embodiment, the subject to be tested is placed in 
a brain scanner, e.g., an MRI, fN4RI, MEG, fCT, 01, SPECT, or PET system. 
The imaged information can be acquired while the subject undergoes an experimental 
paradigm focused on one or more "motivation/emotion" processes. Alternatively, signals 
can be acquired while the subject is exposed to certain stimuli (e.g. the subject views 
photographs of people or food or consumer products) or while the subject performs 
particular tasks (e.g. presses a bar to get a particular result). Alternatively still, the subject 
can perform two or more of the above tasks while the CNS signals are obtained. 
The signals are statistically analyzed and localized to specific anatomical and functional 
brain regions. The details of the processes for statistically analyzing the CNS signals and 
localizing the signals to specific brain regions can vary. 

[0240] Referring now to the exemplary apparatus in FIG. 23, a noninvasive 
measurement apparatus and system for measuring indices of brain activity is described, 
e.g., as follows. In this particular example a magnetic resonance imaging (MRI) system 
216 that may be programmed to non-invasively aid in the determination of indices of 
braiii activity during motivational and emotional function in accordance with the present 
invention is shown. Its should be appreciated however that other techniques including but 
not limited to fMRI, PET, 01, SPECT, CT, fCT, MRS, MEG and EEG may also be used 
to non-invasively measure indices pf brain activity during motivational and emotional 
function. 

[0241] MRI system 215 includes a niagnet 216 having gradient coils 216a and RF 
coils 216b disposed thereabout in a particular manner to provide a magnet system 217. In 
response to control signals provided firom a controller processor 218, a transmitter 219 
provides a signal to the RF coil 216b through an RF power amplifier 220, A gradient 
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amplifier 221 provides a current to the gradient coils 216a also in response to signals 
provided by the control processor 218. 

[0242] For generating a uniform, steady magnetic field required for MRI, the 
magnet system 217 may be provided having a resistance or superconducting coils and 
which are driven by a generator. The magnetic fields are generated in an examination or 
scanning space or region 222 in which the object to be examined is disposed. For 
example, if the object is a person or patient to be examined, the person or portion of the 
person to be examined is disposed in the region 222. 

[0243] The transmitter/amplifier combination 2 1 9, 220 drives the coil 21 6b. After 

activation of the transmitter coil 216b, spin resonance signals are generated in the object 
situated in the examination space 222, which signals are detected and are collected by a 
receiver 223. Depending upon the measuring technique to be executed, the same coil can 
be used as the transmitter coil and the receiver coil or use can be made of separate coils 
for transmission and reception. The detected resonance signals are sampled, digitized in a 
Digitzer/Aray proceser 224. Digitizer/Anay processor 224 converts the analog signals to 
a stream of digital bits which represent the measured data and provides the bit stream to 
the control processor 218. 

[0244] A display 226 coupled to the control processor 21 8 is provided for the 
display of the reconstructed image. The display 226 may be provided for example as a 
monitor, a terminal, such as a CRT or flat panel display. 

[0245] A user provides scan and display operation conunands and parameters to 
the control processor 218 through a scan interface 228 and a display operation interface 
230 each of which provide means for a user to interface with and control the operating 
parameters of the MRI system 2TS in a manner well known to those of ordinary skill in 
the art. 

[0246] The confrbl processor 218 can be coupled to a signiail proceissor 232 arid a 

data store 236. The signal processor can be programmed according to a method 
described herein, e.g., to process raw image information. The processing can include 
localizing signals to a particular region of the brain. 
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[0247] Some Exemplary Brain Circuits 

[0248] Brain circuitry includes a prefrontal and sensory cortex. The prefrontal 
cortex includes medial prefrontal cortex and lateral prefrontal cortex. The region also 
includes the primary sensory and motor components. These components include the 
primary somatosensory cortex (SI), the secondary somatosensory cortex (S2) , the 
primary motor cortices (Ml), and secondary motor cortices (M2). Motor behavior 
involves regions such as Ml and M2, along with supplementary motor cortex (SMA). 
The frontal eye fields (102h) modulate motor aspects of eye control relating to directing 
the reception of visual signals from the environment to the brain. 

[0249] Brain circuitry also includes the thalamus region the dorsal striatum region 

and the lateral and medial temporal cortex regions. The medial temporal cortex region 
includes, for example, the hippocampus , the basolateral amygdala , and the entorhinal 
cortex. Also included as part of the brain circuitry are paralimbic regions which include, 
for example, the insula , the orbital cortex , the parahippocampus and the anterior 
cingulate. Current perspectives of reward circuitry also include the hypothalamus the 
ventral pallidum and a plurality of regions collectively designated. 

[0250] The regions collectively designated comprises the nucleus accumbums 
(NAc) the central amygdala the sublenticular extended amygdala of the basal forebrain 
SLEA^asal forebrain or SLEA/BF) the ventral tegmentum (ventral tier) and the ventral 
tegmentum (dorsal tier) . 

[025 1 ] The regions collectively represent a number of regions having significant 

involvement in motivational and emotional processing. It should be appreciated that other 
components such as the basolateral amygdala are also important but not included in the 
regions designated by reference number. Other regions that are fiirther important to this 
type of processing include the hypothalamus, the orbitofrontal cortex , the insula and the 
anterior cingulate cortex . Further regions are also important but listed separately such as 
the ventral palUdum , the thalamus , the dorsal striatum , the hippocampus , the medial 

■ 

prefrontal cortex , and the lateral prefrontal cortex . Not listed in this figure but also 
involved in processing sensory information for its emotional implications is the 
cerebellum. 
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[0252] The functional contribution of each of these major regions are discussed 
below. It should be noted that what follows is a gross simplification and does not convey 
the complexity nor the diversity of the functions that these regions have been implicated 
with and may in the future be connected to. Further note that there is cunently a debate 
regarding the modular vs. non-modular function of these brain regions, i.e., can a specific 
function be attributed to each region in isolation. Accordingly what is listed below is 
information which provides one of ordinary skill in the art with the understanding that 
this function may be mediated by the connection of this region with many other regions 
(i.e., the function mediated by a distributed set of regions, of which the identified region 
is a fundamental component). 

[02S3] As a brain region the NAc has previously been implicated in the 
processing of rewarding/addicting stimuli, and is thought to have a number of functions 
with regard to probability assessments and reward evaluation. It has also has been 
implicated in the moment by moment modulation of behavior (e.g., initiation of 
behavior). Signals measured from the NAc are shown and described below in conjunction 
with FIGS. 3A-3D. 

[0254] The SLEA/BF has been implicated in reward evaluation, based on its 
likely role in brain stimulation reward effects. It is thought to be important for estimating 
the intensity of a reward value. It and other sections of the basal forebrain appear to be 
important for the processing of emotional stimuli in general, and it has been implicated in 
drug addiction. 

[0255] Like the NAc, the amygdala has been implicated in both processing of 
emotional information along with processing of pain and analgesia information. The 
amygdala has been implicated in both the orienting to arid the memory of motivationally 
salient stimuli across the entire spectrum from aversion to reward. It may be important 
for the processing of signals with social salience in real time. In this context it is often 
referred to with regard to fear. A number of its anatomical connections to primary 
sensory cortices, suggest that it is important for the modulation of attention to 
motivationally salient stimuli. 
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[02S6] With respect to the VT/PAG» doparminnergic projections are present from 
the VT to the SLEA, the orbitofrontal cortex, the amygdala, and the NAc. Indeed 
dopaminergic projections go to most subcortical and prefrontal sites. In FIG. 3, the 
fundamental importance of the VT/PAG projection is focussed on the NAc, central 
Amygdala, and SLEA/B , though it also projects to other regions. The VT has been 
implicated in reward prediction processes, motor functions and a number of learning 
processes around motivational events in general. The PAG has also been implicated as a 
modulator of pain stimuli, for example, and may therefore be a region that signals early 
information on rewarding or aversive stimuli. 

[0257] The GOb component of the prefrontal cortex has bera implicated in a 

number of cognitive, memory, and planning functions around emotional stimuli or 
regarding rewarding or aversive outcomes in animal and human studies. This section of 
the prefrontal cortex has also been implicated in modulating pain. It has afferent and 
efferent connections with a number of subcortical structures. The GOb is involved in a 
number of different reward processes mcluding those of expectancy determination and 
reward valuation. Patients with lesions in this region tend to have impulse control 
problems. 

[0258] The hypothalamus is involved in the monitoring and maintenance of 
homeostatic systems. It also has been both implicated in the evaluation of the relevance 
for rewarding and aversive stimuli in order to maintain homeostatic equilibrium. The 
hypothalamus is highly important for meeting the objectives which optimize fitness over 
time and meet the requirements necessary for survival. 

[0259] The cingulate cortex has been interpreted to be involved in attention and 
planning, the processing of pain unpleasantness, the processing of reward events and 
emotions in general, and the evaluation of emotional conflict. The cingulate cortex is an 
extensive region of brain cortex and appears to have emotional and cognitive 
subdivisions, to name a few. 

[0260] The insula has been implicated in number of functions including the 
processing of emotional stimuli, the processing of somatosensory functions , and the 
processing of visceral function. 
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[0261 ] The thalamus is composed of a number of sub-nuclei which have been 
implicated in a diverse range or functions. Fundamental among these functions appears to 
be that of being an informational relay of sensory and other uiformation between the 
external and internal environment. It has also been directly implicated in both rewarding 
and aversive processes, and damage to the structure may result in dysfunction such as 
chronic pain. 

[0262] The hippocampus has been extensively implicated in functions for . 

encoding and retrieval of information. Lesions to this structure lead to severe impairment 
in the ability to form new memories. Motivated behavior is heavily dependent on such 
memories: for instance, how a particular behavior in the past led to obtaining a goal 
object which would reduce a particular deficit state such as thirst. 

* 

[0263] The ventral palladium is one of the primary output sources of the NAc and 
has a number of projection sites including the dorsomedial nucleus of the thalamus. Via 
this connection, it is one of the major relays between the NAc and the rest of the brain, in 
particular prefrontal cortical regions. It has been strongly implicated in reward functions 
and is a site thought to be important for the development of addiction. 

[0264] The medial prefrontal cortex of the brain has been strongly implicated in 
reward functions and has been found to be one of the few brain sites into which cocaine 
self-administration can be initiated in animals. 

■ 

[026S] In response to reward and aversion situations, certain regions of the brain 
circuitry play a role in processing reward/aversive information to plaii behavioral 
responses as discussed above. These regions are designated reward/aversion regions of 
the brain. The activation of such reward/aversion regions can be observed during positive 
and negative reinforcement using neuroimaging technology. These reward/aversion 
regions produce specific fimctionai contributions to motivated behavior. For example, 
contributions made by.regions.such as, the include assessment of probability. 

[0266] The following non- limiting example illustrates some aspects of the 
methods described herein in one particular iinplementation 
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r02671 Example fPart I) 

[0268] Cocaine and nicotine are two of the most acutely reinforcing drugs in 
humans and in animals (Johanson & Fischman, 1989); they are also profoundly addicting 
(Gawin & Ellinwood, 1988), and have a strong co-morbidity with depression. Twenty- 
five percent of the U.S. Population suffers from nicotine dependence, and smoking leads 
to about 500,000 deaths per year. Major depression is the most conunon psychiatric 
disorder in the U.S. today, and the number one cause of mortality in the world (Murray & 

4 

Lopez, 1996). It is frequently co-morbid in individuals that cease nicotine or cocaine self- 
administration. Baseline anhedonia or dysthymia is also hypothesized to be a causal 
factor in the development of nicotine dependence and is observed in individuals between 
episodes of cocaine self-administrations. Long-term use of psychostimulants and the 
ensuing dependence has pronounced effects on the circuitry of reward-aversion. Recent 
neuroimaging and post-mortem stereology have also documented functional and 
morphometric changes in the brain circuitry of reward-aversion with mood disorders 
(Manji et al., 2001; Ongur et al, 1998; see review by Breiter & Gasic, Appendix 10). 

[0269] The neural circuitry that mediates the rewarding (i.e., hedonic) effects of 
psychostimulants (Koob et al., 1998), or the rewarding and aversive effects of other 
stimuli (Wise et al., 1978; Wise, et al., 1992; Koob, 1992; Stein & Fuller, 1992; 
Kometsky & Esposito, 1981), can be readily studied by fMRI BOLD to obtain 
quantitative measures of changes in brain activity. These circuits include the: nucleus 
accumbehs (NAc), sub-lenticular extended amygdala (SLEA) of the basal forebrain, 
amygdala, ventral tegmentum (VT), and orbital gyrus (GOb), along with other paralimbic 
regions such as the anterior cingulate, insula, parahippocampus, and temporal pole. 
Together, these brain regions (referred to as the reward-aversion circuitry) appear to be 
fundamental to the assessment of motivationally salient informational features for 
organizing behavior. 

[0270] In humans, these brain regions have been shown to process expectancy 

and valuation information and the sequential effects of expectancy on subsequent 
outcomes. The differential valuation of rewarding vs. aversive outcomes utilizes the same 
brain circuitry, and unique signal profiles in a subset of these regions have been mapped 
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for rewarding vs. aversive stimuli. We can characterize the relative contributions made by 
each of these subregions to discrete components of reward^aversion function in different 
individuals using paradigms (e.g., paradigms described in Breiter et al., 1996; Coh^ et 
al, 1996; Seidman et al., 1998; Breiter & Rosen, 1999; Aharon et al., 2001; Beceira et 
al., 2001; Breiter et al., 2001). Thus, we can sample, for example: 

(1) stimulus input and representation, 

(2) feature extraction necessary for assessing motivational intent in others, 

(3) probability functions necessary for expectancy determination , 

(4) expectancy vs. outcome functions, 

(5) valuation functions, and 

(6) positive vs. aversive outcomes. 

We can develop a systems biology map, e^g., by using information firoin at least two of 
these functions. The map can describe how a stimulus that is rewardmg or aversive is 
processed. 

[0271] In this exemplary case, the system assessed determines reward-aversion 

function and acts as an informational backbone for motivation. The ability to produce 
siich system biology maps in individuals further gives us a precise mechanism by which 
to characterize malfunctions in this circuitry that quantitatively characterize functional 
brain disorders such as stimulant addiction and depression. 

[0272] Circuitry-based events responsible for behavior and intracellular signaling 
events, at very different spatiotemporal scales of brain function, are interlinked and that 
processes at the distal ends of this spatiotemporal continuum can serve as markers, e.g., 
for genetic analysis. 

[0273] Analysis of brain structure and/or function produces a set of quantitative 
indices (e.g., a systems biology map) which can be associated with genetic informatioii 
from the subject; Typically suc^ genetic iriformatibh ihclUdes mairkeiS on a p 
different non-homologous chromosomes. When sufficient number of individuals are 
analyzed, statistics can be used to evaluate the relationship between genotype and 
phenotype. Linkage from a set of quantitative indices, such as the multitude of 
quantitative measures in a systems biology map, to the quantitative measures of 
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molecular genetics can pinpoint the genes that contribute to susceptibility and/or 
resistance to functional brain disorders such as addiction and depression. 

r02741 Example (Part 2^: Detailed Methods 

[0275] (a) Subject recruitment, screening, and scheduling 

[0276] For Phi , a total of 500 subjects (plus 8% of this number as potential 

replacements) will be recruited for scanning over one year (months 6 - months 18 of the 
project, and then rescanned during months 19 - 30). For Ph2, a total of 800 subjects per 
year (plus 8% of this number as potential replacements) will be recruited over 4 years 
(total - 3200 + replacements). All phases of this project will be conducted according to 
the U.S. Food and Drug Administration guidelines and the Declaration of Helsinki. To 
protect all sensitive data, we will obtain a Certificate of Confidentiality from NIH. 
Written informed consent will be obtained from all patients before protocol-specified 
procedures are carried out. Subjects will be drawn from an outpatient sample, and will be 
recruited through general media, as well as physician referrals. 

[0277] Inclusion Criteria: The following conditions must be met for patient 

eUgibiUty: 

(1) Written informed consent. 

(2) Men and women aged between 20-65 years, as sib pairs who are concordant 
for the criteria below, discordant, or in nuclear families with these diagnoses). 

(3) Nicotine dependent subjects: 

a. Smokers who have smoked>10 cigarettes/day for more than 2 years 
. b. Meet DSM-IVR criteria for Nicotine Dependence, as determined with 
the Fagerstrom Nicotine Tolerance Questionnaire (FTQ) (Fagerstrom, 1978) 

c. Saliva cotinine levels > 14 ug/L, and end-expiratory carbon monoxide 
levels > 8ppm (subjects with alcohol dependence will be excluded). 

(4) Cocaine dependent subjects: 

a. DSM-IVR diagnosis of cocaine dependence (who are actively using 
cocaine at the time of entry and are not seeking or participating in treatment for 
addiction) without other .Axis I psychiatric ilhiesses or past experience of violent 
behavior while abusing cocaine or opiates. Exceptions will be made for 
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dq)endence on caffeine or for mild (less than 3 standard drinks/week) 

consumption of alcohol 

b. Validation of subjects self-rq)orted drug use will be performed using 
hair specimens assayed for levels of commonly abused drugs. Our previously 
published data indicate that self^reported substance use in oiir non-treatment 
seeking research subjects is generally valid (Elman et al., 1999). 

(5) Subjects with recurrent major depressive disorder: 

a. DSM-IVR criteria for lifetime diagnoses of Unipolar Depressive Disorders 
(major depression, dysthymia^ and minor depression), according to the 
Stmctured Clinical Interview for DSM-IV Axis I Disorders/Patient Edition 
(SCID-I/P) (First et al, 1995) and Inventory for Depressive Symptomatology 
(Rush et ah, 1986, 1996; GuUion et al., 1998). The DSM-IVR SCID approach 
will be used by clinicians, and complemented with a SSAGA-II performed by 
trained research assistants. 

(6) Subjects with (3) and/or (4) and/or (5) (see Dierker et al, 2002) 

■ 

(7) Healthy controls without (3) or (4) or (5) 

[0278] Exclusion Criteria: In brief, subjects with any of the following will be 
excluded: pregnancy; suicidality or homocidality; serious medical ilbiess including HiV 
+ status; severe respiratory compromise; current use of nicotine-containing products in 
subjects without nicotine dependence; history of seizure disorder; delirium, dementia, or 
mental disorders due to general medical conditions; isubstance abuse not specified above; 
schizophrenia; schizo^affective disorder; delusional disorder; bipolar disorder; psychotic 
disorders not elsewhere specified; antisocial personality disorder, unless cbmorbid with 
cocaine dependence; current anorexiaA)u]imia nervosa; clinical laboratory evidence of 
hypothyroidism/hyperthyroidism. 

[0279] (b) Experiments. . „ 

[0280] The six experimental paradigms listed below will be run with all subjects 

in Phi and Ph2. For each subject, these six paradigms will be run in the order in which 
they are listed. The time needed for each paradigm will be: 
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#1 social reward paradigm, 22 minutes, 

#2 CPT / probability paradigm, 4 1/2 minutes, 

#3 physiological aversion / pain paradigm, 4 1/2 minutes, 

#4 mental rotation paradigm, 8 min and 54 seconds, 

#5 emotional faces paradigm, 1 1 1/2 minutes, 

#6monetary reward paradigm, 24 minutes. 

[028 1 ] Between each paradigm, 2 minutes are scheduled for the overall imaging 
session to allow the reading of the next set of instructions. The total time for functional 
imaging will thus be 75 1/2 minutes, plus approximately 10 minutes for 5 x 2 minute 
pauses between the scanning of each paradigm. This thus leaves 35 minutes for structural 
scanning described in the imaging section. Of these 6 paradigms, 4 of them have a 
traditional block design (#2 - #5), while 2 of them (#1 & #6) have a single trial-like 
design. These paradigms have been chosen because they robustly activate reward- 
aversion circuitry, to produce a systems biology map. 

10282J (I) Social reward paradigm (Aharon et al., 2001 ) 

[0283] Social stimuli will consist of two sets of 40 non-famous human faces 

[digitized at 600 dpi in 8-bit grayscale, spatially down sampled, and cropped to fit in an 
oval "window" sized 310-350 pixels wide by 470 pixels high using Photoshop 4.0 
software (Adobe Systems)]. Each set will consist of 20 male and 20 female faces. 
Subjects will be told that they will be exposed to a series of pictures that if not interfered 
with, will change every eight seconds. However, if they want a picture to disappear 
faster, they can alternate pressing the "z" and "x" keys, whereas if they want a picture to 
stay longer on the screen, they can alternate pressing the "n" and "m" keys. The 
dependent measures of mterest will be the amoiuit of work in units of key press that 
subjects exert in response to the different categories of stimuli, and their resulting 
viewing durations., . . . .. 

[0284] Each pair of key presses will be set to increase or decrease the total 
viewing time according to the following formula: NewTotalTime = OldTotalTime + 
(ExtremeTime-OldTotalTime) / K, where ExtremeTime was 0 seconds for keypresses 
reducing the viewing time, ExtremeTime will be 14 seconds for keypresses increasing the 
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viewing time, and K was a scaling constant set to 40. If the elapsed time for the picture 
surpassed the total time determined by keypressing, the picture was removed and the next 
trial began. A "slider" was displayed left of each picture indicating total viewing time at 
any moment, and changing with every keypress. Subjects will be informed that the task 
will last 40 minutes, and that this length is independent of their behavior during the task, 
as is their overall payment for participating in the experiment. 

f0285J (2) CPT with differential probabUity condUions (Breiier & Rosen, 1999; 
Seidman et aL, 1998) 

[0286] The set of experimental conditions in this study are designed to parse out 
differences in vigilant attention during a serial processing continuous performance task 
[CPT-AX(del)], involving a simple probabilistic relationship between a cue and delayed 
target, versus a dual processing continuous performance task [CPT-AX(int)], with a 
complex probability relationship between a cue and delayed target. The conditional 
probability of a subsequent target, given the incidence of a cue, will be the same between 
tasks smce the CPT-AX(del) and CPT- AX(int) tasks have the same total number of cue- 
target pairs, and the same total incidence of true cues plus false cues. The tasks will be 
different in that the determination of cue-target pairs is more effortful for the CPT- 
AX(del) task, due to divided processing and interference suppression needs. The effortful 
determination of cue-target pairs will impair probability computation and lead to 
diminished task performance. 

[0287] The two paradigm conditions will involve computer presentation of an 

auditory letter string, with each letter spoken at a rate of 1 per second. These paradigms 
will have an A-B-A-C-A-C-A-B^A design where the A condition will be a simple CPT 
(referred to as the "QA" sequence), and the B condition will be an effortful CPT with 

« 

three letters between cue and target pairs. The B and C conditions will involve either 
. serial processing (CPT- AX(del)) or divided/dual processing (CPT-AX(int)). The CPT- 
' AX(del) is characterized by a lack of false cues or targets between each cue ("q") and 
target ("a") pair, or by any interdigitated cue-target pairs (i.e., "q"J'q"J'a"_"a"), thus 
allowing simple probabilistic assessment of cue to target pairing with serial association of 
stimulus and response. The CPT- AX(int) has false cues and/or targets between pairs of 
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cues and targets, and has cue-target pairs interdigitate together so that conuningled pairs 
were possible, thus preventing simple coimting or rehearsal procedures (i.e., forcing 
subjects to maintain two or more counts), and increasing the effort needed for 
probabilistic assessment of cue to target pairing. Each B and C epoch will last 90 
seconds, while the baseline A epochs will last 60 seconds. There will be a target to 
distracter ratio of .13 for both A and B conditions, and the number of cue-target pairs will 
be the same. Subjects will respond with a magnet compatible button press, so that 
reaction time and accuracy could be recorded. The order for performing the CPT-AX(del) 
and the CPT-AX(int) will be counterbalanced across subjects. 

f0288J (3) Physiological Aversion (Thermal Pain) (Becerra ei aL, 2001) 

[0289] Subjects will be informed in detail about the nature of the experiment, and 
the temporal sequence of procedures, including rating methods. These rating will involve 
rating on a scale from 0 (no pain) to 10 (maximum pain) their perception of the pwi they 
experienced, after the functional run. Thermal stimuli will be delivered using a modified 
[Becerra et al., 1999] Peltier based thermode (Medoc, Haifa, Israel). One scan will be 
performed during which a base temperature of 35 ®C (30 s) (condition A), a warm 
stimulus of 41 °C (25 s) (condition B), and a target temperature of 46X (condition C) 
will be interleaved. The thermode will be set to change the temperature at a rate of 4 
®C/s. Thus, it will take 2 s to reach 41 °C from the baseline and 2 s to return to baseline, 
while for the 35-46*'C contrast, the delay will be 4 seconds. The delays were not part of • 
the baseline (30 s) or stimulus (25 s) times. The three stimuli will be interleaved in a 
block design: A-B-A-C-A-C-A-B-A. 

[0290] (4) Mental rotation (Cohen et aL, 1996) 

[0291] The figures will be the original Shepard and Metzler (1971) objects. They 

will thus consist of three-dimensional perspective drawings of 10 cubes arranged in chiral 
patterns and viewed from a variety of rotation angles. Two task variants will be used. In a 
control condition, subjects will be shown a pair of figures, half of which are identical, and 
half of which are mirror-reversed shapes. Each of the 10 possible angled-shapes (0 - 180** 
- in 20^ increments) will appear in each type of pair. The stimulus ordering will consist of 
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a set of blocks, so that each of the stimuli appear once before any stimulus appears twice, 
and each appears twice before any appears three times, and so forth. Within each of these 
blocks, the stimuli will appear in random order except that the same stimulus will not 
appear twice within three successive trials. Moreover, half of the pairs within each block 
will include identical figures and half include mirror-reversed figures. No more than three 
consecutive trials can have the same response. 

[0292] The second version of the task (a rotation variant) will be identical to the 
first except that the members of each pair will be presented at different orientations. The 
left member will always be presented so that the major axis is vertical. The right member 
will be presented at mne possible angles (20 - 180° in 20"* increments) from vertical. 
Three sets of these rotation trials will be used (and 4 sets of control trials), which will 
include rotations around different major axes. One set will include rotations around the x- 
axis, another around the y-axis, and another around the z-axis. These stimuli will be 
presented in separate sets. Within each set, the stimulus trials will be ordered so that each 
orientation appears once before it appears again, once with identical stimuli and once 
with mirror-imaged stimuli, within each balanced subgroup of 1 8 trials. The same 
orientation will not appear twice within three consecutive trials. 

[0293] A third "resting" or fixation condition will be interleaved between the 
"control" and "rotation" tasks. Subjects will be asked to look at each pair, and to decide 
whether the figures are identical or are mirror-images and to indicate their choice by 
pressing one of two buttons. In the control condition, subjects will be asked to simply 
respond as quickly and accurately as possible. In the rotation condition, they will be told 
to visualize the right-haiid stimulus rotating until it is aligned with the left-hand stimulus, 
and then to decide whether the two shapes are identical or are niirror reversed. 

[0294] (5) Emotional faces (Breiter et aL, 1996) 

[0295] Faces used in these experiments will be from Ekman aiid Friesen ( 1 976). 
They will have been standardized by (a) digitization, (b) scaling of extents, (c) 
normalization of contrast across all expressions for each of the individuals utilized (N=8), 
and across all individuals in the cohort., and (d) fitting with an oval mask to minimize the 
observation of hair. 
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[0296] The experiment will employ an A-B- A-C-A-D- A-B-A-D-A-C-A-B-A 
design with equal length epochs of tachistoscopic-like presentations of the faces as. In A, 
subjects will see 180 presentations of 8 faces in random order; neutral expressions 
(200msec) will be followed by a fixation point (300msec). In B, C, and D, subjects will 
see faces with one emotion presented 180 times per epoch with the same timing 
parameters as in A. These facial expressions will be: happy (B), angry (C), and fearful 
(D). The order in which these blocks of facial expression are presented will be 
counterbalanced by emotion, and by epoch order within run. This will be a covert 
paradigm design with passive viewing of tachistoscopic-like face presentations, and use 
of the same 8 individuals for each expression presented in random order per epoch. 

f029 7 J ( 6) Monetary expectancy^ gains, and losses (Breiter et al. 2001) 

[0298] hi this experiment we seek to map the hemodynamic changes that 

anticipate and accompany monetary losses and gains Under varying conditions of 
controlled expectation and counterfactual comparison. The display will consist of either a 
fixation p9int or one of 2 disks ("spinners"). Each spinner will be divided into 2 sectors. 
Both spinners will offer the same outcomes, a gain of $+10 or a loss of $-8, but the 
likelihood of the gain Avill be high (0.66) on the "good" spinner arid low (0.33) on the. 
"bad" spinner. The relative areas of the spinner assigned to the two outcomes represent 
the likelihoods. Thus, on the good spinner, 66% of the area is colored green and labeled 
$+10, and the remaining 33% of die area is colored red and labeled $^8; on the bad 
spinner, the colors and labels are reversed. Providing larger gains than losses will be 
implemented to compensate for the tendency of subjects to assign greater weight to a loss 
than to a gain of equal magnitude. 

[0299] Before the game begiris, subjects will be shown each spinner 3 times s6 

to learn its composition. Each trial .will consist of (1) an "expectancy phase," when a 

spinner is presented and an arrow spins around it,.and (2) an "outcome phase", when the 

arrow lands on one sector and the corresponding amount is added to or subtracted from 
the subject's winnings. During the expectancy phase, the image of one of the 2 spinners 
will projected for 10 sec, and the subject will score their emotional response to the 
displayed spinner (or fixation point) using a potentiometer. During the outcome phase, 

* 
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the arrow will land on one of the sectors and flicker for 9.S seconds, indicating how much 
they won or lost. During this time, subjects will score their emotional response to the 
observed outcome. After 9.5 seconds, a 0.5 second mask will appear. On fixation-point 
trials, an asterisk will appear in the center of the display for 19.5 sec, followed by the 0.5- 
sec mask. The pseudo-random trial sequence will be fiiUy counter-balanced to the first 
order so that trials of a given type (spinner + outcome) are both preceded and followed by 
the same number of 4 spinner/outcome combinations and 2 times by fixation-point trials. 
Subjects will observe 24 trials of the +$10 outcome, 24 trials of the -$8 outcome, and 16 
trials of spinner baseline. A ''dununy'' trial will be inserted at the begiiming and end of 
each run for counterbalancing, allowing 18 trials per run for 4 runs. Runs will be 
separated by 2 min rest periods. The same trial sequence will be used for all subjects, 
generating winnings of $48, to which will be added the $50 endowment. 

[0300] (c) Imaging (3Tand 71) 

[0301] Five hundred subjects in Phi and potentially 3200 subjects in Ph2, plus 
replacement subjects, will be scanned on a 3.0 T Allegra System (Siemens) using a 
quadrature Siemens head coil. The Siemens system performs a whole head shimming 
procedure before scanning begins, which incorporates a fiiU array of second order shims 
to optimize BO homogeneity, and thus reduce susceptibility/resistance in targeted reward- 
aversion regions of interest. Imaging for all experiments will begin with a 3-plane scout 
scan (conventional FLASH sequence with isotropic voxels of 2.8 mm). The axial and 
coronal scouts will be used for placement and prescription of a 3D MPRAGE anatomic 
scan, which will be used for anatomic localization of functional activation, and 
quantitative volumetric measurements. Prescription of experimental slices will follow this 
sequence with 30 slices parallel to the AC-PC line and covering the NAc, amygdala^ 
SLEA, hypothalamus, VT and GOb, along with most of the lateral prefirontal cortex, and 
components of the parietal-occipital junction. BOLD imaging will then be performed 
using a gradient echo T2* weighted sequence (TR/TE=2000/29 ms,; FOV = 20 cm; in- 
plane resolution 3.125x3.125 nun, slice thickness = 3 nun; 30 contiguous axial slices). 

[0302] For Phi, 100 subjects will be scanned on a 7.0T ultra high field system 

developed for functional brain studies. If the results of comparison between the 3T and 
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7T systems are favorable to the 7T system, then the 3200 subjects of Ph2 will be scamied 
oil it. The 71 system consists of a whole body magnet (Magnex Scientific) with a custom 
made resistive shim set (through 3rd order) and custom head gradient set. The study will 
obtain a 3D MFRAGE anatomic scan, and then a conventional T2 scan at the same slice 
locations as the functional prescription. Functional imaging will consist of a high 
resolution (l.Smm x l.Snrni x 3nun) single shot gradient echo sequence, covering less 
brain volume than the 3T scanning protocol, but including the NAc^ amygdala, SLEA, 
hypothalamus, VT and GOb, along components of the lateral prefrontal cortex and the 
parietal-occipital junction. 

[0303] (d) Data analysis of neuroimaging data 

[0304] Anatomic segmentation/parcellation for volumetrics and activation 
localization 

[0305] The anatomic scans of all subjects will undergo segmentation and 

parcellation. Segmentation methodology based on intensity contour and differential 
intensity contour concepts can be used (see, e.g., Kennedy et ai, 1989; Caviness et al., 
1996; Filipek et al, 1994; Rademacher et al., 1992). The cortical parcellation technique is 
based upon the concept of limiting sulci and planes and takes advantage of the observed 
relationships between cortical surface features and the location of functional cortical 
areas. A critical advantage of this method is that the definitions are unambiguously 
definable in a standardized fashion fi'oni the information visible in high resolution MRI. 

[0306] To perform this process with 500+ subjects for Phi and 3200+ subjects for 
Ph2, we can use an automated, fully 3D procedure for whole-brain segmentation. The 
technique uses a set of manually labeled brains as a training set in order to compute prior 
probabilities and class statistics, and applies a Bayesian classification rule. Specifically, 
we compute the maximum a posteriori (MAP) estimate of the segmentation given an 
input image / and prior information fi'om the training set. Formally this can be expressed 
as maximizing p(W\I)y the probabiUty distribution of the segmentation given the observed 
image intensities. The prior probability of a given segmentation is initially encoded 
assuming that the classification at each voxel is independent of all other voxels. This 
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constraint is then relaxed, and the image is iteratively resegmented using an anisotropic 
Markov-random field to model the image segmentation, resulting in a final segmentation 
that is more spatially uniform as well as more accurate than the initial one. 

[0307] Manually parcellated surfaces will also be used as a training set that can 
be employed to construct classifiers in an analogous manner to the sub-cortical 
segmentation procedure. This process will depend on two properties of the cortical 
surface. The first is mean curvature of the surface, a difierentiai measure of the surface 
folding computed firom tiie trace of the Hessian matrix of the height function of the 
surface over its tangent plane at each point. The second is the average convexity of the 
surface (Fischl et al, 1999), a measure that is more sensitive to the presence of primary 
folds than to secondary or tertiary folds. The initial labeling is performed by assuming the 
classification is spatially independent, so that the probability of the neuroanatoraical label 
at each point in the cortex is independent from all other cortical locations. This is of 
course not the case, as the probability of each label is related to the labels of the 
neighborhood in which it lies. In order to capture the spatial regularity of the labeling, we 
model the surface labeling using an anisotropic Markov random field. The anisotoropy 
comes from the observation that labels are much more likely to change as one moves 
across the cortex in the direction of maximum curvature (i.e. the first principal direction) 
than in the direction of minimum curvature (i.e. the second principal direction). This 
information is encoded in the form of Gibbs priors on the probability of a given labeling. 
The most probable labeling is theii iteratively recomputed using the independent spatial 
labeling as iiiput, using the Iterated Conditional Modes (ICM) algorithm (Besag, 1974). 

[0308] To fiirther investigate the surface-based structure of each subjects* brain, 
we will further use a set of automated tools for the construction of geometrically accurate 
and tppologically correct models of the cortical sheet. These include accurate 
segmentation of gray matter and white matter (Dale et al., 1999), inflation and flattening 
of the surface models for Visualization and analysis purposes ^ischl et al-j 1999, 2000, 
2002; Sereiio et al., 1995), and automatic correction of topological defects (Fischl et al., 
2002). The explicit construction of both the gray/white and pial surface boundaries 
allows the accurate measurement of the thickness of the cortical sheet (Fischl et al., 
2000). The thickness of cortex is a potentially important diagnostic measure for a variety 
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of neurodegenerative and psychiatric disorders, many of which are associated with 
progressive, regionally specific atrophy of the gray matter (for instance, consider 
alterations in prefrontal and temporal cortex volunie observed with cocaine dependence; 
Franklin etal., 2002). 

[0309] FMRI data preparation 

[03 1 0] For this project, four of the experimental paradigms have a traditional 
block design, and 2 haye a single trial-like design. For the block design experinients, data 
preparation will generally follow the procedures specified in Aharon et al. (2001), while 
for the single trial-like designs, it will generally follow procedures specified in Breiter et 
al. (2001). In Phi, data preparation and assessment of main effects between groups 
(below) will involve analysis using FSL / FS Fast. As an example of the process planned 
for data preparation, data preparation will involve motion correction, intensity 
ndmialization, signal detrending, and spatial filtering. For example, after motion 
correction, time series data will be inspected to ensure that no data set evidences residual 
motion in the form of cortical rim or ventricular artifacts > 1 voxel Functional data will 
then be intensity scaled on a voxel^by-voxel basis to a standard of 1000, so that all mean 
baseline raw magnetic resonance signals are equal. These data will then be detrended to 
remove any linear drift over the course of the scan. Spatial filtering will be performed 
using a Hanning filter with l.S voxel radius (this approximates a 0.7 voxel gaussian 
filter). Lastly, the mean signal intensity for each voxel over all runs will be removed on a 
time point by time point basis. For the single trial-like experiments, data will further be 
selectively averaged and normalized relative to the 4 time points of data preceding the 

■ * 

trial (see Breiter et al., 2001). 

[03 1 1] The data analytic procedures used in this project will be based on two 

assumptions, (a) that the behavior of the heihodytiamic control system is ^^proximately 
linear (i-e,, it obeys the sup^eifp tinder the experimental cdhditiohs tested ' 

and in the brain regions targeted by these paradigms, and (b) that deviations fi'om 
hemodynamic stationarity will be correctable by means of the normalization procedures 
eniployed. If the hemodynamic control system obeys superposition and stationarity, then 
the counterbalancing procedure used in each paradigm ensures that any carryover of 
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hemodynamic responses from antecedent experimental conditions will be constant across 
conditions. 

[03 12] Two separate approaches will be applied to the evaluation of salient 
changes related to experimental condition. The first will be based on the evaluation of 
individual data on its native anatomy. The second will be based on the evaluation of 
aggregate effects on averaged data that are then evaluated within all individuals in the 
cohort, using a standardized anatomical space. 

[03 13] Individual data on native anatomy: Individual analyses will be pursued 
since aggregate analyses may produce Type I errors in the case of (1) opponent responses 
to different experimental conditions, which would tend to cancel as a result of averaging, 
or (2) responses confined to a small proportion of trial types or confined to a putative 
phenotype, which may be diluted by averaging. For single trial-like experiments, data 
obtained at all time points for each experimental condition will be statistically evaluated 
by correlation with a model impulse function (Boynton et al., 1996; Dale & Buckner, 
1997), To eliminate cross-trial hemodynamic overlap, statistical maps will be derived 
from correlation between the y function and a difference signal between each 
experimental condition and the paradigm baseline, For block design experirnents, a 7 
function will be convoluted with the experimental time course, and used in a correlation 
analysis. For both single trial-like experiments and block design experiments, the 
outcome of correlation analysis will be assessed for foci of signal change using a cluster- 
growing algorithm (for example: Bush et al., 1996). Clusters selected for further analysis 
will be required to either meet a corrected statistical threshold, or have signal intensity 
changes from baseline > 0.05%. For the corrected statistical threshold (in order to 
maintain an overall a < 0.05), the cluster^growing algorithm will localize activations that 
meet a corrected p value threshold of p < 0.00075 (0.05/67) for the number of 
segmentation and parcellation regions searched. Regions of interest (ROIs)will be 
delineated by the voxels with p < 0.05 in a 7mm radius of the voxel with the minimum p 
value (the "max vox"). These ROIs will then be used to sample the % signal intensity 
change per condition from the experimental baseline. 
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[03 1 4] Identified ROIs will be localized via superposition of the segmentation 
and parcellation contours produced as described above^ The % signal intensity change 
from baseline for each of the experimental conditions in the task will then be quantitated 
and organized in a matrix based on the anatomic segmentation and parcellation units vs. 
hemispheric laterality. If a focus of signal change is observed in an anatomic 
segmentation/parcellation unit, it will be noted in the matrix for that anatomic region as 
an itemset of % signal changes from baseline of each experimental condition. Each 
experiment will have an independent matrix, as will the volumetric, and clinical 
information. 

[03 1 5] Aggregated individuals in Talairach space: Analysis of individuals may 

miss low-level signal changes obsarved in conmion across the cohort, hence an analysis 
on aggregate data will also be utilized. The outcomes of this analysis that are not found 
by the analysis of individuals on native anatomy (above), will supplement the results 
found above. Indeed, these results will constitute a second matrix for each experimental 
paradigm, thus producing a total of twelve matrices with fimctional data, along with two 
more from volumetric and clinical data. 

[03 1 6] Analysis of aggregated individuals will identify foci of signal change 
across the aggregate, and apply ROIs of these foci to individuals to sample the % signal 
intensity change per condition from the experimental baseline. It must be noted that such 
analyses from the aggregate may produce Type I errors in the case of opponent responses 
. to different experimental conditions, which would tend to cancel as a result of averaging, 
or responses confined to a small proportion of trial types or confined to a putative 
phenotype, which may be diluted by averaging. 

[0317] To allow averaging of data across subjects (for Phi, 250+ subjects and 
then SOOf subjects; for Ph2, 1600+ subjects, and then 3200+ subjects), each individual's 
set of functional data and structural data will be transformed into Talairach space (Breiter 
et al, 1996a, c; Talairach & Toumoux, 1988), and resliced in the coronal orientation with 
isotropic voxel dimensions (x,y, z ^ 3.125 mm for 3T, and = 1.5 mm for 7T). Optimized 
fit between functional data and structural scans will then be obtained via translation of 
exterior contours. These Talairach transfonned functional and structural scans will then 
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be averaged. The same procedures for ROI identification used with the individual data on 
native anatomy will then be used to identify a set of activation clusters on the averaged 
data to then be used as ROIs to sample from each Talairach transformed individual data 
set the % signal intensity change per condition from the experimental baseline. In the 
averaged data, only activations will be selected that meet a corrected p value threshold of 
p < 0.0007S (0.05/67) for the number of segmentation and paicellation regions searched. 
Activation ROIs from the averaged data will be anatomically localized in each individual 
on the Talairach transformed individual structural data, using superimposed segmentation 
and parcellation contours that have also been morphed into the Talairach domain. Again, 
as with the data produced by analysis of individuals, the data produced from ROIs 
determined on averaged data, will be Usted in matrices. 

[0318] Classification tree analysis (phenotyping) 

[03 1 9] To partition the neuroimaging data into the fewest number of sets for the 

quantitative indices measured, that will be predictive of any future data set obtained, an 
algorithm based on classification tree analysis will be used. These analyses will be 
performed on the fimctional and quantitative volumetric data organized in matrices for 
each individual. In general, these techniques split data sets presented to them into sub- 
classes, and keep track of how it was done via a decision-tree structure. This decision tree 
structure can then be used to classify novel data. There are a number of classification 
techniques, all of which aim to select the class with the highest estiinated conditional 
probability without computing the whole probability distribution. These techniques 
basically differ in their employment of different biases in their first steps. Regression 
trees are basically like classification trees, with the difference that they handle continuous 
data, and given that the fimctional and structural data will be continuous, they will be 

» 

utilized. For these algorithms, most of the effort goes into determiiiing the optimal 
ordering of the variables in the decision tree, as well as the level at which to cease the 
decision process (i.e., there comes a point when all members of a branch should be in the 
same classification). These algorithms are also typically "non-parametric" in that no 
predictive model has to be hypothesized for the fitting. 
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[0320] The software that will be used for this process will be the CART 
(classification and regression trees) system initially designed by Steinberg and CoUa 
(1995) and distributed by Salford Systems, CA (note: this system is also incorporated into 
many statistical packages such as S^plus). CART can lead to "over-fitting" of the data, in 
that it finds too many classes, Overfitting leads to the identification of too many itemsets 
(e.g., interesting patterns in the data), which can be a serious issue in domains with many 
multi- valued parameters, where the search space is large. To protect against this outcome, 
association rules (Srikant & Agrawal, 1995) can be used to test the salience of all the 
possible correlations between subclasses in the data, and then prune ofTnon^informative 
decisions. A standard approach for making optimal class predictions using association 
rules is the "Large Bayes Classifier'' of Meretakis and colleagues. In general, these 
techniques are computation intensive, necessitating the use of a commercial cluster box 
system or supercomputer. A recursive-partitioning technique (see, e.g.,Zhang and 
Bonney, 2000) can also be used. 

[032 1 ] . During Phi , the first 60 subjects per diagnostic category scanned on the 
3T magnet (total of 250 subjects) will be used as a training set, and the subsequent 
subjects scanned will be used as a test set to assess the initial classification schema. This 
process will then be repeated using the fiill complement of subjects scanned on the 3T as 
a training set^ and the 100 subjects scanned on the 7T as a test set. A greater specification 
of the identified classes found firom the larger training set would indicate that the initial 
cohort had not produced saturation of the identified endophenotypes. The training set 
size can be enlarged as the project progresses. 

[0322] Assessment of main effects between groups 

* 

[0323] To evaluate the efficacy of our phenotyping methods, statistical 
assessment of effects between groups and correlations between fimctional and structural 
liieasufes will be pSrfdnned. These analyses should produce results Sinbedded in !the 
output of the classification tree analysis. Estimates of the central tendency (location) and 
dispersion (scale) of the data distribution of the diagnostic groups will use conventional 
least squares statistics or a robust statistics module, e.g., a Tukey bisquare estimator 
(Hoaglin et al., 1983; Breiter et al., 2001). Robust statistics are less subject than 
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conventional parametric statistics to the influence of outliers and provide more efficient 
estimates of location and scale of contaminated normal distributions. Although robust 
methods are more efficient when dealing with contaminated distributions, they are less 
efficient than parametric statistics when dealing with near-normal distributions. 

[0324] Main effects between experimental conditions, and experimental 
conditions in each diagnostic group can be assessed using multiple regression to carry out 
a random effects analysis of variance (ANOVA). Experimental condition will be defined 
as a categorical (noncontinuous) variable, thus avoiding any assumptions concerning the 
form of the time courses. For the data determined on individual native anatomy, the 
ANOVA results will need to meet a more stringent a level than the conventional O.OS 
value by correction for the number of clusters tested in each individual. For the data 
determined from averaged data, the ANOVA results will need to meet a more stringent a 
level than the conventional O.OS value by correction for the number of clusters found on 
the averaged data. For both data measured from individual native data, and data from 
Talairach transformed data, in cases that meet the criterion a level, pair-wise contrasts 
between specific experimental conditions will then be performed. 

[032S] As a last analysis, an autocorrelation analysis will be performed among the 

functional and structural measures within diagnostic groups, and between diagnostic 

groups, using a Pearson product-moment correlation coefficient. The correction for 

performing multiple autocorrelations will be O.OS adjusted by the number of calculations 

* 

performed. 

[ 0326] FMRI power analysis 

[0327] . To estimate our statistical power to detect a difference between 
experimental conditions that would segregate potential endophenotypes for cocaine 
dependent, nicotine dependent, or mood disordered subjects vs. controls, we first 
determined the expected efifect-size by reanalyzing prior experimental data. We 
reanalyzed data from a set of experimental stimuli that produced similar signal magnitude 
changes in reward regions to those produced by the 6 experimental paradigms described 
above, and that had a similar number of time points to those 6 paradigms in their 
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shortened format planned for this project. We also selected data for these calculations 
from experiments that involved subjects with cocaine addiction, and subjects who were 
healthy controls. In one case we reanlyzed a prior cocaine infusion study (Breiter et al, 
1997), while in another we reanalyzed a morphine infusion study in healthy controls 
(Breiter et al., 2000). For each subject in the cocaine infusion study, signal from all 
voxels in the bilateral NAc (defined anatomically on Talairach*transformed images) was 
normalized, averaged, and linearly detrended. The resulting time series of 136 time points 
had an average standard deviation, across 20 infusions in 13 subjects, equal to 0.84% of 

ft 

the grand average signal level. The difference in signal between the 38 pre-infiision time 
pomts and.the 98 post-infusion time points in a fixed volume around the peak voxel with 
the NAc was 1 .50%, corresponding to an effects size ofd= 1 .79 standard deviations. To 
achieve 90% power to detect a signal difference of this magnitude at p < O.OS (two-tailed) 
would require N = 15 independent comparisons (Cohen, 1988). Effect sizes of cocaine 
infusion in other subcortical structures ranged from d = 0.60 to 2.14, For the morphine 
infusion study, effect sizes of morphine infusion in drug-naive subjects ranged from d.= 
0.71 to 1.67. We also note that our preliminary data indicates effects of similar magnitude 
can be found when comparing cocaine dependent and healthy control subjects with non- 
infusion paradigms such as the monetary reward paradigm. These calculations suggest 
that endophenotypes based on quantitative differences in signal change across individuals 
should be distinguishable, e.g., with 15 subjects per diagnostic group. 

[0328] Family Based Association Studies: 

[0329] The recruitment strategy is based upon the identification of families as the 
unit of analysis. Thus we also propose family based association studies for the candidate 
gene association studies that will be performed. We have performed several family based 
association studies (see for example Wilk et al. 2001; DeStefano et al. 2002). 

[0330] Family based association tests (FBATs) evaluating association between 

markers and the various phenotypes will be conducted using the program FBAT (Laird et 
al. 2000). These tests are described in detail elsewhere (Rabinowitz and Laird 2000; 
Horvath et al. 2001). A general form of a family based association test statistic for family 
i (with tii offspring) is 
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where Xy is a function of the genotype data of offspring j in family i and Ty is a function 
of the phenotype data of that offspring. For a biallelic marker a score statistic based on Si 
can be defined as 

N 

Z = I[S,-E(S,.)]/VV{S,) 

where E(Si) and V(Si) are the mean and variance of Si under the null hypothesis of no 
linkage and N is the total nimiber of families. If the coding of Xy specifies an additive 
model (i.e. Xy = the number of alleles of interest (0, 1 or 2) carried by offspring] in 
family i) and Ty is specified as 0 for unaffected and 1 for affected, then this statistic is 
equivalent to the TDT for genotyped parent-offspring trios (Lunetta et al, 2000). 

[033 1 ] When parental genotypes are available, E(Si) can be computed by 
conditioning on the observed traits and parental marker genotypes, and is based on 
Mendelian transmission probabilities (see Horvath et al. 2001 for details). This further 
justifies the collection of parental genotype information. Rabinowitz and Laird (2000). 
invoke the statistical method of conditioning on sufficient statistics for the null 
hypothesis to construct a test of association when parental genotypes are not available. In 
this case the offspring genotype distribution is defined by conditioning on the observed 
traits, the partially observed parental genotypes and on the offspring configuration. 
Tables presenting the conditional probabilities when partial or no parental genotype 
information is available are given in the FBAT technical report portion of the FBAT 
documentation. At least two distinct offspring genotypes must be observed for a family 
to contribute to the FBAT statistic when parental genotypes are not available. The 
statistical theory of conditioning on the sufficient statistics results in correct p- values 
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(type I error rate) regardless of the population admixture, patterns of missmg genotypes 
or genetic model (Rabinowitz and Laird 2000). 

[0332] Both multiallelic and biallelic association tests can be used. For the 
biallelic tests an additive genetic model will be assumed with Xy coded as described 
above. Coding of Xjj for multiallelic tests are described elsewhere (Horvath et al. 2001). 
The unknown underlying genetic model may determine which test, biallelic or 
multiallelic, is more powerful, hence both will be considered here. Two definitions of the 
trait will be employed. In the first definition Ty = the quantitative trait for offspring j in 
family i. In the second definition Tjj = (quantitative trait - where \i= a constant that is 
chosen to minimize the variance of the test statistic (Horvath et al. 2001). For these trait 
definitions, a positive Z statistic indicates that the allele is associated with a larger value. 

[0333] , • Sib Pair Estimates: 

[0334] In the linkage power analysis, 900 sibling pairs are mentioned, but the 

analysis is for a quantitative trait, so it uses the continuous measure for fMRI as the trait 

to which we are linking. Other than the proband, sibs are not defined as '^affected" or 

unaffected, merely by their fMRI measure(s). 
* 

[0335] An exemplary 900 sib-pair is estimated based upon the following 
numbers: 



Cocaine families: 213 families, 5 members in each family: 

half (n=107) consisting of 1 parent and 4 offspring (107 probands, 321siblings), 

half (n=106) consisting of two parents 3 offspring (106 probands, 318 siblings): 

(total number of siblings = 639). 

Number of sibling pairs: 

107 X 6 = 642 sib pairs 

106x3 = 318 sibpairs 

960 sibling pairs total 

Nicotine families: 152 families, 7 members: 

half (n=76) consisting of 1 parent and 4 children, 2 avuncular or cousin^ 
half (n=76) consisting of 2 parents, 3 children, 2 avuncular or cousin. 

456 sib pairs 

228 sib pairs = 684 pairs 

For the purposes of power estimates we will add one additional sibling 
pair (e.g. one cousin pair) for each of the 152 families: 
152 additional pairs 
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836 sibling pairs totaL 

Familial Depression families: 107 families, 10 members: 

half (n=54) consisting of 1 parent and 4 children, 5 avuncular & cousin. 

half (n=S3) consisting of 2 parents, 3 children, S avuncular & cousin. 

456 sib pairs 

228 sib pairs = 684 

For the purposes of power estimates we will add two additional sibling 
pairs (e.g. two cousin pairs, perhaps from two different sibships) for each 
of the 107 families: 

214 additional pairs 
898 sibling pairs total. 
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[03361 



Example CPart 3V Detailed Description of Database 



[0337] 



The text that follows sununarizes a number of salient features of the Brain 



Imaging, Genetics, and Behavioral Assessments Database (BIGB AD) including Database 
Design, Database Architecture, Data Entry Procedures, Data Transfer Procedures, Data 
Confidentiality, Accessibility and Security, Quality Control, and Database PersonnelA 



been designed to meet the following objectives: (1) Receive and store all data 
(behavioral, MRI and molecular genetic) acquired during the project; (2) Provide an easy 
to use, intuitive interface which reflects the work- and dataflow defined by the project 
protocol; (3) Provide data entry interfaces for behavioral data entry which, to the extent 
possible, mimic the *actuar test forms; (4) Perform immediate, automatic quality control 
where possible (vaHdity of data entry, e.g., type, range; redundancy checks); (5) Provide 
facilities for "manual' quality control at various stages; (6) Automate data transfer 
(behavioral measures, MRI and molecular genetic) as much as possible; (7) Simplify 
communication between the four working cores of the phenotype-genotype project; (8) 
Serve raw and processed data to the outside world, under to-be-defined access control. 

[0340] Note that for simplicity, in this section all non-MRI and non-genetic data 
(i.e., clinical, neurological, cognitive, ...) is referred to as "behavioral." The overarching 
goal of data coordination is to collect all MRI, genetics, and behavioral data acquired for 
the targeted study. In the case of MRI scanning and molecular genetic studies, collecting 
data is a relatively straightforward process; in contrast, the behavioral data is significantly 
more complex, both conceptually and practically. In general, BIGBAD has been designed 
firom the assumption that, whenever possible, ALL behavioral data (raw data and 
summary scores) will be stored. 

[0341] BIGBAD has been designed in a modular fashion, making it highly 
flexible and expandable. As such, each behavioral test is implemented as a separate 
database module, developed in coordination with the PI in the Clinical Phenotyping 
Working Core responsible for that instrument. Many tests also required development of 



(a) Database Design 



[0339] 



The Brain Imaging, Genetics, and Behavioral Assessments Database has 
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complex scoring algorithms^ which were either defined or developed by the responsible 
PL The result of this conmion effort will be automatic real-time scoring of the majority of 
instruments at the time of data entry. Diagram 1 lists the instruments included in the 
behavioral test battery 

Diagram 1 : Behavioral instruments included in the Phenotype-Genotype test battery 



Commercial and/or electronic tests 

SCID-I/P 

IDS 

Fagerstrom Nicotine Tolerance Questionaire 

SSACA 

SSAGA-n 

Other tests 

Full medical History, ROS and Exam 

Full neurological History, ROS» and Exam 

Handedness 

Pregnancy Test 

HIV and HepC 

LFTs, CBC, SMA-20, 

Hair toxicology 

Urine Toxicology 

WAIS-R 

Saliva Continine 

End-expiratory CO 



[0342] 0) Database Architecture 

[0343] The primary rationale for using an established database such as BIGBAD 
is to provide ease of conununication between the working cores, ease of data-entry, and 
continuity in workflow and dataflow by closely mirroring the Project*s logic and 
workflow (see the diagram for information flow in Appendix 5 that organizes the 
activities performed by each working core). - 

[0344] The components of the system in terms of data acquisition and processing 

can include: (i) the clinical phenotyping working core and their offline (i.e., paper and 
pencil), online (i.e., computer-based), and chemistry-based measures; (ii) the MRI 
scanner used by the neuroimaging working core and its data-analysis platforms; (iii) the 
quantitative anatomy working core and its data-analysis platforms; (iv) the neurogenetics 
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working core; (v) the PC's or Linux-based workstations at each working core, that upload 
data to the central database; (vi) the database hardware and software installed and 
configured at the central database, allowing data entry and access through predefined 
access mechanisms; (vii) the central supercomputer and disk storage; and (viii) the data 
backup system (i.e., tape-farm) for the central database. 

[0345] The database can handle data acquisition from multiple working cores, 
provide full subject confidentiaUty, and manage repetitive testing/scanning of subjects 

» 

throughout the course of the study. 

[0346] The database architecture can have a three-tier structure: 
Database Layer - a relational database (server side) 

Application Logic Layer -application logic controlling user access and query 

execution 

Front-end Layer - -web-based graphical user interface (GUI) (for mvestigators 

in each working core) 

The structural three-tier organization enables applications to be distributed over many 
physical locations and coniputing platforms. Investigators access the database via front- 
end interfaces (e.g., GUIs) developed to best suit their computing environments. These 
interfaces can be implemented using virtually any programming language and even other 
databases' GUIs (for example, Microsoft Access can be used as a front-end to a MySQL 
database). At the same time, investigators can seamlessly connect to multiple databases 
using one GUI. 

[0347] ( I ) Database software platform 

[0348] The database is developed using MySQL, an Open Source Database 

Management System (see http://www.mysql.com). MySQL is a database management 
system that incorporates a relational model for its databases, and supports ANSI SQL 

(standard querying language). It is very flexible and supports compatibility with other 

■' • • ' . - , . • . . . . - 

database management systems. MySQL also supports ODBC (Open DataBase 
Connectivity, an industry standard apphcation progranuning interface (API) for 
transparent database access) and JDBC (a Java API for executing SQL statements), hence 
making it possible to use MySQL as a back-end database to many different applications 
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(e.g., MRI data processing pipeline, Microsoft Excel, Matlab, ...). MySQL's client/server 
architecture allows the development of various firont-end interfaces with seamless 
connectivity to the Database servers. 

[0349] The MySQL architecture corresponds well to the requirements of the 

Phenotype-Genotype Study. The server (The MySQL daemon process mysqld) connects 
investigators by creating a new server process for each investigator. Investigators access 
the MySQL database exclusively through the mysqld process. Thus the MySQL database 
server (program) focuses only on data handling, while the mysqld processes take care of 
the investigator's connectivity and control his/her access privileges. 

[03S0] The Graphical User Interface (GUI) to the database was developed to 

ensure data and structure flexibility, cross-platform independence, and transparent and 
full Internet support. It has been implemented as a web-based GUI, written primarily in 
PHP4 (http://www.php.net). PHP is a powerful and versatile server-side scripting 
language, featuring an extensive programming interface to MySQL. For certain 
operations and data manipulation tasks, PHP is complemented by software developed in 
Perl, JavaScript, or Java. 

[035 1] For secure and automatic data transfer from any PC/workstation in the 
working cores to the central database, a combination of Unison 
(http://www.cis.upenn.edu/--bcpierce/unison/) and Secure Shell (SSH, 
http://www.ssh.com) is used. Unison is a file synchronizer, which efficiently 
synchronizes the data present on the laptop with a central data repository. This process is 
run through SSH to ensure secure, encrypted data transfer. 

[03S2] (2) Database Layer 

[03S3] The core of the management system for the database is a relational 

database with thousands of fields storing neurological, psychological (behavioral), and 
medical data (including genetics data), raw and derived scores, MRI scans and analyzed 
images, and MRI header information. 

[0354] The core of the database structure is the candidate profile, built around a 

study subject as a basic "data unit." A study subject is registered by an investigator at the 
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clinical phenotyping working core. Some study subjects will undergo multiple visits to a 
particular working core (such as for test and retest scanning). During such a visit, a 
distinct battery of behavioral instruments and MRI procedures will be administered, both 
of which are age and study objective dependent. 

[0355] (3) Application Logic Layer 

[0356] The middle tier consists of MySQL-based user management functions 
(using a special "mysql" database to manage user accounts). This enables the mysql 
daemon processes to verify user accounts at connection time and keep track of their 
access privileges during their work with the database. This way, the (work) load to verify 

* 

users is removed from both Database and front-end applications. 

[0357] At the same time, PHP (server-side), Perl, and Java scripts dynamically 

develop SQL to query the Database, receive and process resulting data sets and present 
them to the front-end applications. Since the database front-end delivers completely 
dynamic web-content that is displayed on the investigators browsers, it's this application 
layer's job to define and deliver variable and rules for displaying the content. 

[0358] (4) Front-end Uyer 

[0359] The front-end (GUI) layer was designed to mirror the project workflow, its 
forms to resemble original layouts of the paper test forms, thereby making data entry 
highly intuitive. The main Menus of the GUI represent the actual candidate screening and 
data acquisition stages of the study: L Candidate Recruitment Stage/Menu (e.g., initial 
recruitment of the candidate to the project); 2. Candidate Screening Stage/Menu (e.g., 
further pre-visit screening of the candidate); 3. Candidate Visit Stage/Menu (e.g., 
candidate visit for clmical phenotyping); 4.Approval Stage/Menu (e.g., post-visit period 
for evaluation of collected data and administered neuro-psychological instruments). 

[00360] Other Menus bring User Management, Data Management, and Candidate 

... ^ • • ' ...» ...... 1 . ' « . 

Profile Management features: 1, Central Database Area (e.g., data management features, 
offers real-time monitoring of the data acquisition process at all sites, user-defmed 
querying and displaying of various database statistics); 2. Candidate Information (e.g.. 
candidate profile management features); 3. User Information (e.g., user personal and 
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contact information); 4. Administration (e.g. various administrative tasks, from 
registering new users to changing access privileges for users or groups). 

[00361] The MRI and behavioral battery of instruments will be clearly displayed in 
the candidate profile menu. Data entry and evaluation status of all instruments will offer 
easy review of the status of work with each candidate enrolled in the project. 

[00362] (c) Data Entry Protocols and Procedures 

[00363] The data entry aspect of the database has been designed along two 
principles: make data entry easy, and make it accurate. To make data entry easy, the 
online forms have been designed to resemble as much as possible the paper forms that 
researchers are used to working with. The same headers, titles and layouts as the paper 
tests will be provided online in many cases, and where they are not, clear instructions will 
be written to smoodi any transitional problems. Data entry in this environment is 
extremely fast, and typically takes only a few minutes for even the longest measures. 
Many shorter measures take only moments to enter data, and feedback (including scores) 
is immediate. 

[00364] To make data entry accurate, the online forms provide several basic levels 
of quality control. They limit the entry options of nearly every field, making 
unreasonable values impossible to enter. They provide unmediate feedback to the data 
entered, and allow investigators to easily check any and all of their entries. Finally, 
trained personnel will explicitly verify a randomly selected subset of the data entered 
against paper originals. 

[0036S] Data entry on the conmiercial software integrated into the project is a 
more complex issue. Each conunercial software package has its own protocols for data, 
entry, but when exported information arrives at the central database, it is run through a 
standardized battery of checks. Primarily, these checks involve verification of the 
candidate identity (does this file belong to the correct subject), and of basic information 
content (does this file contain the information that it should). After these checks have 
been done, the data is subject to the same quality control as the data arriving via the 
online interface. 
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[00366] Note that given Phi and Ph2 are primarily focussed on research at one 
center, the data entry system will feature double entry of data, a standard procedure for 
maintaining data quality. For Phi and Ph2, this database will perform double scoring of 
the behavioral instruments, thus providing quality control by comparing the summary 
scores only. For Ph3, this feature may be developed to the point that it can be utilized 
across multiple centers. 

[00367J (d) Data Transfer Protocols and Procedures 

[00368] The proposed data transfer mechanism for the phenotype-genotype project 
calls for a study workstation for each investigator in the four working cores to function as 
an extension of the central database, effectively constituting a data gateway between 
them. In this scenario, all acquired study data, be it clinical/behavioral, MRI, or 
neurogenetic measures flows fi:om acquisition through the workstation to the central 
database. 

■ 

[00369] Although data transfer is technically possible using currently available 
mechanisms, it should be noted that for MRI data this procedure can be cumbersome and 
require additional human resources. Specifically at the central database, the verification, 
QC, and format conversion of MRI data requires significant manual intervention. 

[00370] For data transfer purposes, this study has three primary categories of datar 
(1) clinical/behavioral data from paper-and^pencil tests and computerized tests (e.g., 
SCID-I/P); (2) structural and functional MRI data; and (3) molecular genetics data. 

[00371] These data types can be acquired in different ways, and may require 
sUghtly different treatment for storage, archiving, backup, database entry and transfer. In 
the following, the procediu-es around the clinicalA)ehavioral and MRI data are described, 
given these are the most complex, or the largest data sets, respectively, in the project. 

[00372] the data transfer mechanism has all data acquired or analyzed at the 
working cores travel via a laptop/workstation (or from the MRI scanner to a workstation 
in the neuroimaging working core for offline reconstruction of images) to the central 
database. For two of the data categories, these procedures are summarized as follows: 
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[00373] Clinical/behaviral tests: (a) one set of tests is administered to the subjects 
using standard, paper-and-pencil test forms. The data contained on these forms are to be 
entered into the Database using a data entry interface provided by central database, which 
can be accessed over the Intemet. Note that data entry will not be limited to a single 
laptop/workstation; other computers at the working core, will be usable for data entry, (b) 
Another set of tests are computerized and administered using a laptop with a battery of 
computerized tests. The data generated by these instruments are initially stored in the 
internal representation of each individual software package. Following test 
administration, the test results are manually 'exported* to a format usable for transfer to 
the Database. Each laptop/workstation will be configured with an upload mechanism that 
automatically transfers the exported data to the Database. 

[00374] Structural and functional MRI data: scans are acquired at the MRI console, 
and from there 'pushed* to the Workstation using DICOM transfer. From the 
Workstation, they are subsequently sent to central database using a similar, encrypted 
DICOM transfer mechanism. 

[003 75] (e) Data Confidentiality, Security, and AccessibUity 

[00376] The database is designed with a number of features that control access to 
the database and ensure subject confidentiality. 

[003 77] (f) Quality Control fi^r Data Integrity 

[00378] Four different levels or stages of quality assurance and quality control 
have been designed into the dataflow: 

[00379] (1) At the working core, during and after the data entry. The behavioral 
data are checked automatically for validity, type, and range as the data are entered in the 
on-screen test forms. The MRI scans are visually checked at the MR console. 

[00380] (2) At the working core, before the data is fransferred to central database. 
For behavioral data, this includes an explicit data entry/completeness check. The tests are 
displayed in the order of administration, making it easier to monitor the data entry 
process. Once the user enters all the data for a certain instrument, he/she has to mark that 
the data entry is completed. This informs other users that the test's data entry is 



83 



Attorney Docket No. 00786-S13P01 



completed and disables anyone else but that user firom editing the entered data (with the 
exception of the working core PI, who has the authority to access and modify all data of 
his/her working core). For MRI data, this QC stage consists of a qualitative evaluation of 
the data during pre-processing and before statistical evaluation, using visualization 
software that allows multiple simultaneous cross-sectional views. 

[00381] Once a tesfs data entry or MRI acquisition has been completed and 
checked as such, the authorized user may evaluate the instrument and mark it as 
"Completed PASS" or "Completed FAILURE". If the instrument was not administered 
for some reason, it may be also checked as "Not Administered." Simultaneously, a record 
(related to the QC level) is updated, while an entry (comment) is inserted into the 
comment history table of the Database. Each time this is done, the QC flag table gets 
updated, keeping the latest entry, while the table comment history keeps the 
chronological listing of all comments. This provides a complete audit trail, recording 
exactly what was done with the data throughout the course of the study. 

[00382] (3) At the central database, upon receipt of the data at it. This stage 
verifies the integrity and completeness of the received data and MRI scans, i.e. if the 
received files were correctly transmitted, whether the data is complete, and whether the 
correct acquisition parameters were used. 

[00383] (4) At the central database, following data receipt and integrity check. 
This level of Quality Control is the most comprehensive, in-depth verification of all 
received information for a study subject. The validation at this QC level initiates the 
candidate's "promotion" into a status of a full subject, when a study wide, unique Subject 
ID is assigned to it. For behavioral data, this involves a complete verification of all data 
against source documents (paper forms) on a random subset of candidates, and rapid data 
consistency checks of all data. For MRI data, this involves the qualitative and 
quantitative assessment of image quality 
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f0384J (g) Central Database Organizational Structure 

[038S] The central database is organized into multiple separate domains of 
activity for each of the types of data to be incorporated in it (thus approximating the 
structure for the four working cores). 

■ 

[0386] Other embodiments of the invention are within the following claims. 
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WE CLAIM: 

1 . A datastructure comprising: 

« 

a) genetic information that describes a plurality of genetic markers on at 
least two different, non-homologous chromosomes of a subject or a reference to such 
information; and 

b) a systems biology map of the subject or a reference to such a map, e.g., 

9 

wherein the map comprises information about neural circuit function in the brain. 

2. A datastructure comprising: 

a systems biology map of a subject wherein the map comprises 
quantitative infomiation about neural circuit function in the brain, the information 
indicating function of a plurality of regions of the brain during a plurality of mental 
processes. 

3. A datastructure comprising: 

a systems biology map of a subject wherein the m^ comprises a plurality 
of values corresponding to a set of continuous variables, wherein the variables of the set 
correspond to different regions of the brain, and the values that correspond to the 
variables indicate function of respective regions during a mental process. 

4. The datastructure of claim 1 wherein the genetic information comprises 
information about nucleotide identity for a plurality of genetic markers. 

5. The datastructure of claim 1 wherein the genetic information comprises 
information about methylation status for a plurality of genetic markers. 

6. The datastructure of claim 1 wherein the genetic information comprises 
information about parental origin for a plurality of genetic markers. 
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7. The datastructure of claim 1 wherein the genetic information comprises 
infonnation about chromatin structure or accessibility for a plurality of genetic markers. 

8. The datastructure of claim 1 wherein the genetic information comprises 
information about a haplotype, microsatellite marker, sequence tagged site, or SNP. 

9. The datastructure of claim I wherein the genetic infonnation comprises 
information about a chromosomal deletion, inversion, transversion, rearrangement, 
trisomy, or other chromosomal abnormality. 

10. The datastructure of claim 1 further comprising c) information that is an 
index corresponding to the subject. 

1 1. The datastructure of claim 10 wherein the index corresponding to the subject 
is randomized, encrypted, or anonymous. 

12. The datastructwe of claim 10 wherein the index corresponding to the subject 
identifies the subject. 

13. The datastructure of claim 10 wherein the index corresponding to the subject 
associates the subject with familial or other pedigree information. 

14. The datastructure of claim 1 wherein the systems biology map comprises 
infonnation obtained by imaging. 

15. The datastructure of claim 1 wherein the systems biology map comprises 
structural information. 

16. The datastructure of claim 1 wherein the systems biology map comprises 
functional information. 
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1 7. The datastructure of claim 1 wherein the systems biology map comprises 
information about activity in a plurality of brain regions in at least one paradigm. 

1 8. The datastructure of claim 1 wherein the systems biology map comprises 
information about activity in a plurality of brain regions in at least two paradigms. 

19. The datastructure of claim 1 wherein the plurality of brain regions comprises 
at least ten, twenty, or thirty brain regions. 

20. The datastructure of claim 19 wherein at least ten, twenty, or thirty of the 
brain regions of the plurality are selected from Table 1 . 

21. The datastructure of claim 1 wherein the information for each of the brain 
regions is independent of reference to a coordinate frame. 

22. The datastructure of claim 1 wherein the information for the brain regions is 
organized categorically. 

23. The datastructure of claim 21 wherein the information for each of the brain 
region is indexed according to index values for each of a set of predefined regions. 

24. The datastructure of claim 1 7 wherein the paradigm interacts with the 
informational backbone for motivation. 

25. The datastructure ofclaim 24 wherein the paradigm interacts with a 
reward/aversion mechanism in a normal subject. 

26. The datastructure ofclaim 17 wherein the at least two paradigms interact 
with overlapping, but non-coextensive regions of the brain. 
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27. The datastructure of claim 17 wherein the at least one paradigm is selected 
from the set consisting of: 

a social reward paradigm, 

a CPT / probability paradigm, 

a physiological aversion / pain paradigm, 

a mental rotation paradigm, 

an emotional faces paradigm, and 

a monetary reward paradigm. 

28. The datastructure of claim 1 7 wherein the information about activity for at 
least one of the regions comprises deviations from a reference (e.g., percentage 
differences, ratios, and subtractive values). 

29. The datastructure of claim 1 wherein the systems biology map comprises a 

« 

plurality of matrices, each matrix comprising information about neural activity in a 
plurality of defined brain regions during different paradigms. 

30. A database comprising: a pluraUty of records, wherein each record of the 
plurality includes: the datastructure of claim 1; 

and the records of the plurality include records for a plurality of unrelated 
individuals and records for at least one family member of each of the plurality of 
unrelated individuals. 

31 . A database comprising: a plurality of records, wherein each record of the 
plurality includes: the datastructure of claim 1, wherein the database comprises records 
for at least SO, 100, 200, SOO, 1000, 3000 or 30,000 human subjects. 

32. The database of claim 3 1 wherein one or more of the subjects has a clinical 
diagnosis of a neurological and/or psychiatric disorder. 
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33. The database of claim 3 1 wherein one or more of the subjects has a clinical 
diagnosis of schizophrenia, manic depression, bipolar disorder, addictions, obsessive- 
compulsive disorder, anxiety/paranoia, autism, schizo-afTective disorder, delusional 
disorder, psychosis, antisocial personality disorder, or anorexia/bulimia nervosa. 

34. A method of evaluating information about neural processing, the method 
comprising: 

providing a database that comprises structural and/or functional information about 
brain activity for each of a plurality of subjects; and 

classifying the subjects based on the information. 

35. The method of claim 34 wherein the database comprises information about 
brain activity during at least two different mental processes. 

36. The method of claim 34 wherein the classifying comprises selecting a subset , 
of variables, and sorting the subjects as a function of the variables of the subset. 

37. The method of claim 36 wherein the subset of variables is selected based on 
the information content of each of the variables. 

38. The method of claim 36 wherein the subset of variables is selected based on 
correlations among the variables. 

39. The method of claim 36 wherein each variable is associated with an activity 
of aparticular region of the bram during a paradigm. 

40. The method of claim 34 wherein the classifying comprises generating a tree. 

41 . The method of claim 40 wherein the tree is a binary tree. 
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42. The method of claim 40 wherein each node of the tree corresponds to a 
variable associated with a particular region of the brain and a paradigm. 

43. The method of claim 34 wherein the plurality of subjects comprises at least 
SO, 100, 200, SOO, 1000, or 3000 human subjects. 

44. The method of claim 34 wherein the classifying is recursive. 

45. The method of claim 34 wherein the classifying comprises generating an 
association rule algorithm. 

46. The method of claim 45 wherein the association rule algorithm is non- 
parametric. 

47* The method of claim 34 wherein the classifying comprises classification tree 
analysis. 

48. The method of claim 34 further comprising comparing genetic information 
among subjects of at least one class. 

49. The method of claim 48 wherein the comparing of genetic information 
comprises evaluating a statistic for association of one or more genetic markers among the 
subjects of the at least one class; 

50. The method of claim 34 wherein the information comprises quantitative 
volumetric data evaluated by tomography. 

5 1 . The method of claim 34 wherein the quantitative volumetric data comprises 
a plurality of matrices. 
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52. The method of claim 34 wherein the classifying comprises hierarchical 
clustering, Bayesian clustering, k-means clustering, sel^organizing maps^ or shortest path 
analysis. 

53. The method of claim 34 wherein the subjects are social non-human animals. 

54. The method of claim 34 wherein the subjects are hiunaris or non-human 
primates. 

55. The method of claim 34 wherein the subjects are voles. 

56. A method comprising: 

providing a database that comprises quantitative information about brain 
function for each of a plurality of subjects; 

objectively identifying a subset of subjects from the plurality of subjects 
according to similarity of brain function. 

57. The method of claim 56 wherein a plurality of subsets are objectively 
identified. 

m 

58. The method of claim 56 wherein the identifying comprises objectively 
selecting a subset of quantitative variables whose values vary among the plurality of 
subjects. 

« 

- 59. The method of claim 56 further comprising receiving additional quantitative 

■ 

information about brain function for at least one additional subject, and evaluating 
whether the additional subject is a member of the identified subset. 

* 

60. The method of claim 56 wherein the identifying comprises generating one or 
more association rules that model the subset. 
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61 . The method of claim 56 wherein the identifying comprises generating a 
decision tree that models the subset. 

62. The method of claim 56 wherein the identifying comprises generating a 
probability function that models the subset. 

63. The method of claim 56 wherein the database comprises systems biology 

maps. 

64. The method of claim 63 wherein the systems biology maps comprises values 
determined evaluating subjects during at least two different mental processes. 

65. A data-tree comprising a plurality of nodes, wherein each non-terminal node 
includes (i) a reference to a variable or variable class, wherein the variable or variable 
class is a parameter of brain function in the subject, (ii) optionally, a node level, and (iii) 
criterion for distinguish descendants of the node. 

66. The tree of claim 65 wherein the tree is a binary tree. 

67. The tree of claim 65 wherein each non-terminal node comprises a pointer to 
one or more descendant nodes. 

68. The tree of claim 65 wherein, for at least some of the nodes of the plurality, 
the criterion is an association rule. 

69. The tree of claim 65 wherein each descendant node comprises a probabilistic 
or statistical function that differentiates it firom a sibling descendant node. 

70. The tree of claim 65 wherein the nodes are ordered as function of variables 
that they respectively reference, e.g., as a function of infonnation content or 
autocorrelations for the respective variables. 
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7 1 . The tree of claim 65 wherein at least one of the variables or variable classes 
refers to a brain region in a paradigm. 

72. A datastructure comprising a plurality of matrices, wherein each matrix 
comprises functional information obtained during a mental process of a subject, the 
matrix comprising at least two dimensions, a first dimension that identifies regions of the 
brain, and one or more values for each region, wherein the values correspond to activity 
levels in the respective regions during the mental process. 

73. The datastructure of claim 72 wherein a second dimension identifies a 
hemisphere. 

74. The datastructure of claim 72 that comprises a first matrix that comprises 
functional infonnation obtained during a first paradigm and a second matrix that 
comprises functional information obtained during a second paradigm. 

75. the datastrucUire of claim 72 that comprises a first matrix that comprises 
first values that depend on a native dataset obtained by imaging the subject at multiple 
timepoints, wherein the first values are independent of information from other subjects 
and a second matrix that comprises second values that depend on the same native dataset, 
wherein the second values are determined or are selected as a function of information 
from other subjects. 

76. The datastructure of claim 75 wherein the second values are selected based 
on location of activation centers detected in an aggregate of image information from a 
plurality of other subjects. 
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77. The datastructure of claim 75 wherein the first values are determined and/or 
selected as a function of location of activation centers detected by clustering signal 
changes from a baseline, wherein the signal changes are independent of information fi:om 
any other subject. 

78. A method of providing a systems biology map, the method comprising: 

providing native information about brain function of a subject during a 
mental pix)cess, the information comprising quantitative data for signals in at least a 
plurality of regions; 

comparing signals during the mental process to reference signal 
parameters to locate regions of activity; and 

populating a datastructure with information about signals at least in the 
regions of activity. 

0 

79. The method of claim 78 wherein the reference signal parameters is function 
of a baseline for the subject. 

8G. The method of claim 78 wherein the reference signal parameters are a 
function of signals fix>m a population of subjects. 

8 1 . The method of claim 78 wherein locating regions of activity comprises 
clustering signal changes relative to the reference signal parameters. 

82. The method of claim 81 wherein the clustering comprises defining foci in a 
three-dimensional coordinate space. - 

83. The method of elm 
statistical map. 

84. The method of claim 83 wherein the statistical maps are a function of 
correlation between a gamma function and signal changes. 



-95 



Attorney Docket No. 00786-813P0i 



85. A method of providing a systems biology map, the method comprising: 

providing native datasets about brain function for a plurality of subjects 
during a mental process, the information comprising quantitative data for signals in at 
least a plurality of regions; 

combining information from the native datasets to provide an aggregate 

dataset; and 

localizing regions of activity in the aggregate dataset. 

86. The method of claim 8S wherein the combining comprises transforming 
native datasets to a reference coordinate frame. 

87. The method of claim 86 wherein the combining further comprises averaging 
the native datasets. 

■ 

88. The method of claim 86 wherein the combining further comprises producing 
a statistical map. 

89. The method of claim 8S wherein the localizing comprises clustering signal 
changes in the aggregate dataset. 

90. A method of providing a systems biology map, the method comprising: 

providing native datasets about brain function for a plurality of subjects 
during a mental process, the information comprising quantitative data for signals in at 
least a plurality of regions; 

for each subject, producing a first systems biology map from the native 
dataset of the particular subject, wherein the first system biology map is independent of. 
the native datasets from the other subjects, and a second systems biology map thai is a 
function of regions of activity detected in an aggregate dataset from the plurality of 
subjects. 
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91. A method of diagnosing a subject, the method comprising: 

providing information about structure and/or function of the brain of the 
subject, the information comprising quantitative data for at least a plurality of regions; 
and 

objectively evaluating the information using quantitative criteria; and 
providing a diagnosis for the subject based on results of the evaluating. 

92. The. method of claim 9 1 wherein the quantitative data comprises information 
about brain function during a plurality of mental processes. 

93. The method of claim 92 wherein at least one mental process comprises a 
paradigm. 

9 

94. The method of claim 93 wherein the paradigm evokes the information 
backbone for motivation. 

95. The method of claim 91 wherein the evaluating comprises comparing the 
information about the subject to a decision tree. 

96. The method of claim 95 wherein the comparing comprises evaluating a 
probability of association for the information about the subject and one or more terminal 
nodes of the tree. 

97. The method of claim 95 wherein the comparing comprises evaluating a 
probability of association for the information about the subject and each bifurcation of 
the tree. 

98. The method of claim 91 wherein the evaluating comprises evaluating a 
probability that the information about the subject is within a classification, wherein the 
classification is a function of quantitative activity measures for a plurality of brain 
regions. 
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99. A method of providing a systems biology map, the method comprising 

imaging regions of the brain of a subject while at least one of the regions 
is active to obtain a native dataset that includes information about activity in one or more 
of the regions at a plurality of temporal instances; and 

condensing the native dataset at least 10 fold to provide a condensed 
dataset that comprises quantitative information about at least some of the imaged regions 

1 00. The method of claim 99 wherein the condensed dataset comprises 
information about one or more activity peaks in at least some of the imaged regions 

101. The method of claim 99 wherein the condensed dataset discards time 
resolution for at least 50% of the regions. 

102. The method of claim 99 wherein the regions are imaged by fMRI. 

1 03 . The method of claim 99 wherein the condensed dataset comprises 
information that can be represented as a matrix, one dimension of which differentiates 
among regions of the brain. 

104. A method of providing a systems biology map, the method comprising 

imaging regions of the brain of a subject during a mental process to obtam 
a native dataset that includes information about brain function; and 

populating variables in a matrix by extracting quantitative information 
from the native dataset. 

1 05 . The method of claim 1 04. wherein the matrix comprises at least two • 
dimensions. 

106. The method of claim 104 wherein the first dimension resolves different 
regions of the brain. 



-98 



Attorney Docket No. 00786-81 3P01 



1 07. The method of claim 104 wherein the second dimension resolves the left 
and right hemisphere of the brain. 

1 08. The method of claim 1 04 wherein the matrix comprises a third dimension. 

« 

1 09. The method of claim 1 04 wherein information about one or more 
activations in a given region and hemisphere are provided at respective variables of the 
matrix. 

1 10. The method of claim 104 wherein the information comprises a list, the 
members of the list being stored at different positions along a third dimension of the 
matrix 

111. The method of claim 1 04 wherein the matrix does not provide information 
about time, e.g., the information about the one or more activations is not time-resolved. 

112. A method of providing a systems biology map, the method comprising 

receiving a native dataset that includes imaged information about brain 
function of a subject; and 

■ 

populating variables in a matrix by extracting quantitative information 
from the native dataset. 

113. A method of providing a systems biology map, the method comprising 

imaging regions of the brain of a plurality of subjects; and 

transforming image information to a reference coordinate space; 

selecting a subset of regions for which activations are detected among the 
plurality of subjects; and 

producing a condensed dataset for each subject of the plurality wherein the 
condensed dataset is smaller than the native dataset for each subject of the pluraUty and 
retains information about the selected subset of regions. 
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114. The method of claim 113 wherein selecting the subset comprises averaging 
the transformed image infomiation and evaluating statistically significant changes 
relative to results of the averaging. 

115. The method of claim 113 wherein selecting the subset comprises selecting 
regions that differ from a reference (e.g., a baseline obtained prior or after the mental 
process). 

116. A method comprising: 

receiving functional information about neural circuit activity, the information 
being obtained by iniaging a plurality of brain regions in a subject. 

generating a dataset that dissociates each of a plurality of biain regions 
with quantitative information, wherein the quantitative infomiation comprises lists of 
activation peaks and each list is associated with at least one of the brain regions. 

117. The method of claim 1 16 wherein the list is rank ordered. 

* 

118. The method of claim 116 wherein the dataset is represented as a matrix. 

* 

1 1 9. The method of claim 116 wherein the dataset is represented as a vector. 

120. The method of claim 1 18 wherein members of each list are positioned or 
referenced in consecutive cells along one axis of the matrix. 



121. The method of claim 1 16 wherein the dataset is stored in a relational 
database, e.g., as a table. 

122. A method for evaluating a treatment, the method comprising: 
evaluating a subject to produce a first systems biology map of the subject; 
treating the subject; and 
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evaluating the subject to produce a second systems biology map of the subject; 

wherein the first and second systems biology maps comprise quantitative 
information about brain function in a plurality of brain regions during at least one mental 
process. 

123. The method of claim 122 wherein the treatment comprises administering an 
agent to the subject. 

124« The method of claim 123 wherein the agent is a pharmaceutical, a narcotic, 
an addictive substance, or a non-addictive substance. 

125. The method of claim 1 22 wherein the treatment comprises providing a non- 
invasive therapy to the subject. 

126. The method of claim 1 25 wherein the non-invasive treatment comprises 
hypnosis, music, video^ visual, superficial contacts, exercise, or physical pressure. 

127. The method of claim 1 22 wherein the systems biology maps comprise 
information about activity in a plurality of brain regions in at least one paradigm. 

128. The method of claim 122 wherein the systems biology maps comprise 
information about activity in a plurality of brain regions in at least two paradigms. 

129. The method of claim 122 wherein the pluraUty of brain regions comprises 
at least ten, twenty, or thirty brain regions. 

' 1 30. The method of claim 1 22 wherein at least ten, twenty, or thirty brain 
regions of the plurality are selected from Table 1. 

131. The method of claim 1 22 wherein the information for each of the brain 
region is independent of reference to a coordinate frame. 
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132. The method of claim 122 wherein the infomiation for the brain regions is 
organized categorically. 

1 33 . The method of claim 1 22 wherein the information for each of the brain 
regions is indexed according to index values for each of a set of predefined regions. 

134. The method of claim 127 wherein the paradigm triggers the informational 
backbone for motivation. 

135. The method of claim 127 wherein the paradigm triggers reward/aversion 
mechanism in a normal subject. 

136. A method comprising: 

providing a dataset that comprises quantitative information about brain 
activity during at least two paradigms for each of a plurality of subjects; 

evaluating a parameter that is a continuous function of at least two 
components of the quantitative information, the at least two components being associated 
with different paradigms; and 

analyzing a statistic for association between the parameter and an allele 
for one or more genetic loci. 

137. The method of claim 136 wherein analyzing the statistic comprises non- 
parametric linkage analysis. 

138. A method ofevaluatinginformation, the method comprising: 

obtaining a group of human subjects; 

> • - »» ....■>. 

. ..jr. • i • . . . . • 

imaging the CNS of each subject while the respective subject is exposed to 

information; 

evaluating correlation between a characteristic of neural circuit activity of 
the subjects and alleles present at one or more genetic markers; and 
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providing an evaluation of the information as a function between the 
characteristic and the frequency of an allele in a population. 
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